Contextual-value approach to the generalized measurement of observables 
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We present a detailed motivation for and definition of the contextual values of an observable, 
which were introduced by Dressel et al., Phys. Rev. Lett. 104, 240401 (2010). The theory of 
contextual values is a principled approach to the generalized measurement of observables. It extends 
the well-established theory of generalized state measurements by bridging the gap between partial 
state collapse and the observables that represent physically relevant information about the system. 
To emphasize the general utility of the concept, we first construct the full theory of contextual 
values within an operational formulation of classical probability theory, paying special attention to 
observable construction, detector coupling, generalized measurement, and measurement disturbance. 
We then extend the results to quantum probability theory built as a superstructure on the classical 
theory, pointing out both the classical correspondences to and the full quantum generalizations of 
both Liider's rule and the Aharonov-Bergmann-Lebowitz rule in the process. As such, our treatment 
doubles as a self-contained pedagogical introduction to the essential components of the operational 
formulations for both classical and quantum probability theory. We find in both cases that the 
contextual values of a system observable form a generalized spectrum that is associated with the 
independent outcomes of a partially correlated and generally ambiguous detector; the eigenvalues are 
a special case when the detector is perfectly correlated and unambiguous. To illustrate the approach, 
we apply the technique to both a classical example of marble color detection and a quantum example 
of polarization detection. For the quantum example we detail two devices: Fresnel reflection from a 
glass coverslip, and continuous beam displacement from a calcite crystal. We also analyze the three- 
box paradox to demonstrate that no negative probabilities are necessary in its analysis. Finally, 
we provide a derivation of the quantum weak value as a limit point of a pre- and postselected 
conditioned average and provide sufficient conditions for the derivation to hold. 

PACS numbers: 03.65.Ta,03.67.-a,02.50.Cw,03.65.Ca 



I. INTRODUCTION 

Since the advent of quantum mechanics, practition- 
ers have struggled with an inherent conceptual dual- 
ism in its formalism. On one hand, time evolution of 
a quantum state is a continuous, deterministic, and re- 
versible process well described by a wave equation. On 
the other hand, there is irreducible stochasticity present 
in the measurement process that leads to discontinuous 
and generally irreversible state evolution in the form of 
so-called "quantum jumps" or "state collapse." 

To cope with the necessary introduction of the stochas- 
tic element of the theory while still preserving ties with 
the deterministic classical mechanics, traditional quan- 
tum mechanics (T], 01 emphasizes the role of Hermitian 
observable operators that are analogous to classical ob- 
servables. Indeed, we find that observables underlie most 
of the core concepts in the quantum theory: commutation 
relations of observables, complete sets of commuting ob- 
servables, spectral expansions of observables, conjugate 
pairs of observables, expectation values of observables, 
uncertainty relations between observables, and time evo- 
lution generated by a Hamiltonian observable. Even the 
quantum state is introduced as a superposition of observ- 
able eigenvectors. The stochasticity of the theory man- 
ifests itself as a single prescription for how to average 
the omnipresent observables under a deterministically 
evolving quantum state: the implicit projective quan- 
tum jumps corresponding to laboratory measurements 



are largely hidden by the formalism. 

Experimental control of quantum systems has im- 
proved since the early days of quantum mechanics, how- 
ever, so the discontinuous evolution present in the mea- 
surement process can now be more carefully investigated. 
Modern optical and condensed matter systems, for ex- 
ample, can condition the evolution of a state on the out- 
comes of weakly coupled measurement devices (e.g. 0- 
Q), resulting in nonprojective quantum jumps that alter 
the state more gently, or even resulting in continuous 
controlled evolution of the state. Since observables are 
defined in terms of projective jumps that strongly af- 
fect the state, it becomes unclear how to correctly apply 
a formalism based on observables to such nonprojective 
measurements. A refinement of the traditional formalism 
must be employed to correctly describe the general case. 

To address this need, the theory of quantum opera- 
tions, or generalized measurement, was introduced in the 
early 1970's by Davies [6] and Kraus [7|, and has been 
developed over the past forty years to become a compre- 
hensive and mathematically rigorous theory [8-16]. The 
formalism of quantum operations has seen the most use 
in quantum optics, quantum computation, and quantum 
information communities, where it is indispensable and 
well-supported by experiment. However, it has not yet 
seen wide adoption outside of those communities. 

Unlike the traditional observable formalism, the for- 
malism of quantum operations emphasizes the states. 
Observables are mentioned infrequently in the quantum 
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operations literature, usually appearing only in the con- 
text of projective measurements where they are well- 
understood. Some references (e.g. [3, [H) de- 
fine "generalized observables" in terms of the general- 
ized measurements and detector outcome labels, but give 
no indication about their relationship to traditional ob- 
servables, if any. As a result, there is a conceptual gap 
between the traditional quantum mechanics of observ- 
ables and the modern treatment of quantum operations 
that encompasses a much larger class of possible measure- 
ments than the traditional observables seemingly allow. 

A possible response to this conceptual gap is to declare 
that traditional observables are meaningless outside the 
context of projective measurements. This argument is 
supported by the fact that any generalized measurement 
can be understood as a part of a projective measurement 
being made on a larger joint system that can be associ- 
ated with a traditional observable in the usual way (i.e. 
fl§L p. 20]). However, there has been parallel research 
into the "weak measurement" of observables [l7l - [33l | that 
suggests that linking generalized measurements to tradi- 
tional observables may not be such an outlandish idea. 

Weak measurements were introduced as a consequence 
of the von Neumann measurement protocol 2"! that 
uses an interaction Hamiltonian with variable coupling 
strength to correlate an observable of interest to the gen- 
erator of translations for a continuous meter observable. 
The resulting shift in the meter observable is then used 
to infer information about the observable of interest in a 
nonprojective manner. The technique has been used to 
great effect in the laboratory |34l - t48| to measure physi- 
cal quantities like pulse delays, beam deflections, phase 
shifts, polarization, and averaged trajectories. There- 
fore, we conclude that there must be some meaningful 
way to reconcile nonprojective measurements with tradi- 
tional observables more formally. 

The primary purpose of the present work is to detail a 
synthesis between generalized measurements and observ- 
ables that is powerful enough to encompass projective 
measurements, weak measurements, and any strength of 
measurement in between. The formalism of contextual 
values, which we ex plicitly introduced in pol [50| and 
further developed in |5ll453l [ , forms a bridge between the 
traditional notion of an observable and the modern the- 
ory of quantum operations. For a concise introduction 
to the topic in the context of the quantum theory, we 
recommend reading our letter poj . 

The central idea of the contextual-value formalism is 
that an observable can be completely measured indirectly 
using an imperfectly correlated detector by assigning an 
appropriate set of values to the detector outcomes. The 
assigned set of values generally differs from the set of 
eigenvalues for the observable, and forms a generalized 
spectrum that is associated with the operations of the 
generalized measurement, rather than the spectral pro- 
jections for the observable. Thus, the spectrum that one 
associates with an observable will depend on the context 
of how the measurement is being performed; such an in- 



ability to completely discuss observables without specify- 
ing the full measurement context is reminiscent of Bell- 
Kochen-Specker contextuality [1^, [53 - [59| and motivates 
the name "contextual values." 

The secondary purpose of the present work is to 
demonstrate that the contextual values formalism for 
generalized observable measurement is essentially classi- 
cal in nature. Hence, it has potential applications outside 
the usual scope of the quantum theory. Indeed, we will 
show that any system that can be described by Bayesian 
probability theory can benefit from the contextual- value 
formalism. 

Extending contextual values to the quantum theory 
from the classical theory clarifies which features of the 
quantum theory are novel. The quantum theory can be 
seen as an extension of a classical probability space to a 
continuous manifold of incompatible frameworks, where 
each framework is a copy of the original probability space. 
Hence, intrinsically quantum features arise not from the 
observables defined in any particular framework, but in- 
stead from the relative orientations of the incompatible 
frameworks. As we shall see, the differences manifest in 
sequential measurements and conditional measurements 
due to the probabilistic structure of the incompatible 
frameworks, rather than the observables or contextual 
values themselves. 

To keep the paper self-contained with these aims in 
mind, we first develop both the operational approach to 
measurement and the contextual values formalism com- 
pletely within the confines of classical probability theory, 
giving illustrative examples to cement the ideas. We then 
port the formalism to the quantum theory and identify 
the essential differences that arise. Our analysis therefore 
doubles as a pedagogical introduction to the operational 
approaches for both classical and quantum probability 
theory that should be accessible to a wide audience. 

The paper is organized as follows: In Sec. II A[ we pro- 
vide a simple intuitive example to introduce the concept 
of contextual values. In Sees. Ill Al through III CI we de- 
velop the classical version of the operational approach to 
measurement. In Sec. Ill Dl we introduce the contextual 
values formalism classically and then give several exam- 
ples similar to the initial example. In Sees. IIIIAI through 
nil CI we generalize the classical operations to quantum 
operations and highlight the key differences with explicit 
examples. In Sec. IIII Dl we apply the contextual values 
formalism to the quantum case and show that it is un- 
changed. We also specifically address how to treat weak 
measurements as a special case of our more general for- 
malism and provide a derivation of the quantum weak 
value in Sec. IIII El Finally, we give our conclusions in 
Sec. M\ 



A. Example: Colorblind Detector 

The idea of the contextual values formalism is decep- 
tively simple. Its essence can be distilled from the follow- 
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ing classical example of an ambiguous detector: Suppose 
we wish to measure a marble that may be colored either 
red or green. A person with normal vision can distin- 
guish the colors unambiguously and so would represent 
an ideal detector for the color state of the marble. A 
partially colorblind person, however, may only estimate 
the color correctly some percentage of the time and so 
would represent an ambiguous detector of the color state 
of the marble. 

If the person is only mildly colorblind, then the estima- 
tions will be strongly correlated to the actual color of the 
marble. The ambiguity would then be perturbative and 
could be interpreted as noise introduced into the mea- 
surement. However, if the person is strongly colorblind, 
then the estimations may be only mildly correlated to 
the actual color of the marble. The ambiguity becomes 
nonperturhative, so the noise dominates the signal in the 
measurement. 

We can design an experimental protocol where an ex- 
perimenter holds up a marble and the colorblind person 
gives a thumbs-up if he thinks the marble is green or 
a thumbs-down if he thinks the marble is red. Suppose, 
after testing a large number of known marbles, the exper- 
imenter determines that a green marble correlates with a 
thumbs-up 51% of the time, while a red marble correlates 
with a thumbs-down 53% of the time. The experimental 
outcomes of thumbs-up and thumbs-down are thus only 
weakly correlated with the actual color of the marble. 

Having characterized the detector in this manner, the 
experimenter provides the colorblind person with a very 
large bag of an unknown distribution of colored mar- 
bles. The colorblind person examines every marble, and 
for each one records a thumbs-up or a thumbs-down on 
a sheet of paper, which he then returns to the exper- 
imenter. The experimenter then wishes to reconstruct 
what the average distribution of marble colors in the bag 
must be, given only the ambiguous output of his color- 
blind detector. 

For simplicity, the clever experimenter decides to as- 
sociate the colors with numerical values: 1 for green (g) 
and —1 for red (r). In order to compare the ambiguous 
outputs with the colors, he also assigns them different 
numerical values: a for thumbs-up (u), and h for thumbs- 
down (d). He then writes down the following probabil- 
ity constraint equations for obtaining the average marble 
color, (color), based on what he has observed, 

P{u) = (.51)P(g) + (.49)P(r), 
P{d) = (.47)P(g) + (.53)P(r), 
(color) = lP{g) - lP{r) ^ aP{u) + bP{d), (1) 

which he can rewrite as a matrix equation in the basis of 
the color probabilities P{g) and P(r), 



.51 .47\ (a 
.49 .53] U 



(2) 



After solving this equation, he finds that he must assign 
the amplified values a = 25 and b = —25 to the outcomes 



of thumbs-up and thumbs-down, respectively, in order 
to compensate for the detector ambiguity. After doing 
so, he can confidently calculate the average color of the 
marbles in the large unknown bag using the identity ([1} . 

The classical color observable has eigenvalues of 1 and 
— 1 that correspond to an ideal measurement. The am- 
plified values of 25 and —25 that must be assigned to 
the ambiguous detector outcomes are contextual values 
for the same color observable. The context of the mea- 
surement is the characterization of the colorblind detec- 
tor, which accounts for the degree of colorblindness. The 
expansion ([T|) relates the spectrum of the observable to 
its generalized spectrum of contextual values. With this 
identity, both an ideal detector and a colorblind detector 
can measure the same observable; however, the assigned 
values must change depending on the context of the de- 
tector being used. 



II. CLASSICAL PROBABILITY THEORY 

To define contextual values more formally, we shall de- 
fine generalized measurements within the classical the- 
ory of probability using the same language as quantum 
operations. In particular, rather than representing the 
observables of classical probability theory in the tradi- 
tional way as functions, we shall adopt a more calcula- 
tionally flexible, yet equivalent, algebraic representation 
that closely resembles the operator algebra for quantum 
observables. 

We also briefly comment that the relevant subset of 
probability theory that is summarized here may slightly 
differ in emphasis from incarnations that the reader may 
have encountered previously. Our treatment acknowl- 
edges that probability theory, in its most general incar- 
nation, is a system of formal reasoning about Boolean 
logic propositions [gO, HH ; specifically, our treatment em- 
phasizes logical inference rather than the traditional fre- 
quency analysis of concrete random variable realizations. 
However, the "frequentist" approach of random variables 
is not displaced by the logical approach, but is rather sub- 
sumed as an important special case pertaining to repeat- 
able experiments with logically independent outcomes. 
Due to its clarity and generality, the logical approach has 
been widely adopted in diverse disciplines under the dis- 
tinct name "Bayesian probability theory." Several physi- 
cists, including (but certainly not limited to) Jaynes 62], 
Cave s |63| . Fuchs [63| . S p ekkens (65j . Harrigan [SH, Wise- 
man [16| , and Leifer [67| , have also extolled its virtues in 
recent years. We follow suit to emphasize the generality 
of the contextual- value concept. 



A. Sample spaces and observables 

In what follows, we shall consider the stage on which 
classical probability theory unfolds — namely its space of 
observables — to be a commutative algebra over the re- 
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FIG. 1. Diagram of the relationship between the sample space 
of atomic propositions X, the Boolean algebra of propositions 
Ex , and the algebra of observables . The probability state 
P is a measure from Ex to the interval [0, 1]. The expectation 
functional ( • ) is a linear extension of P that maps E^ to the 
reals R; by construction (■) = P{-) whenever both are defined. 



als that we denote S^. This choice of notation is mo- 
tivated by the fact that the observable algebra is built 
from and contains two related spaces, X and that 
are conceptually distinct and equally important to the 
theory. The three are illustrated in Fig. [1] to orient the 
discussion. To avoid distracting technical detail, we will 
briefly describe finite-dimensional versions of these three 
spaces here, and note straightforward generalizations to 
the continuous case when needed [68t . 

Sample spaces. — The core of a probability space is a 
set of mutually exclusive logic propositions, X, known as 
the sample space of atomic propositions. In other words, 
elements of the sample space, such as g,r G X, repre- 
sent "yes or no" questions that cannot be answered "yes" 
simultaneously and cannot be broken into simpler ques- 
tions. For example, g — "Does the marble look green?" 
and r — "Does the marble look red?" are valid mutually 
exclusive atomic propositions. To be a proper sample 
space, the propositions should form a complete set, mean- 
ing that there must always be exactly one true proposi- 
tion. Physically, such propositions typically correspond 
to mutually independent outcomes of an experiment that 
probes some system of interest. Indeed, any accessible 
physical property must be testable by some experiment, 
and any experiment can be described by such a collection 
of yes or no questions. 

Boolean algebra. — The atomic propositions in X can be 
extended to more complex propositions by logical combi- 
nation in order to form the larger space Ejf . Specifically, 
we can combine them algebraically with a logical OR de- 
noted by addition and a logical and denoted by multi- 
plication. For example, given propositions x,y,z € Sx, 
the quantity xy + yz would denote the proposition "(x 
AND y) or (y AND z)." Importantly, both the sum and 
the product commute since the corresponding logical op- 
erations commute, and the propositions are idempotent 
so = X for any x € Sjf. Furthermore, the product 
of any two nonequal propositions in X must be trivially 
false since they are mutually exclusive; we denote the 



trivially false proposition as since its product with any 
proposition is also trivially false. Similarly, the sum of all 
propositions in X will be trivially true since one of the 
atomic propositions must be true by construction; we de- 
note the trivially true proposition as Ix since its product 
with any proposition x G X leaves that proposition in- 
variant, Ixx = X. The logical operation of NOT, or com- 
plementation (x'^) with respect to X, can then be defined 
as the subtraction from the identity x'^ = Ix — x since 
X + x'^ = Ix must be true for any proposition x £ X hy 
definition. The proposition space Ex contains X and is 
closed under the operations of A ND, OR, and not; hence, 
it forms a Boolean logic algebra [69|. 

Observables. — Finally, we extend Ex linearly over the 
real numbers to obtain the commutative algebra of ob- 
servables E'^. That is, any linear combination of propo- 
sitions F = ax + by with a, 6 e M and x,y €z Ex is an 
observable in E^ ; similarly any linear combination of ob- 
servables H = a'F + b'G with a', 5' e M and F, G G E^ is 
also an observable in E^ . Countable sums are permitted 
provided the coefficients converge. The three spaces X, 
Ex , and T,^ are illustrated in Fig. [TJ 

The observables combine logical propositions with 
numbers that describe the relation of each proposition 
to some meaningful reference. For example, one could 
define a simple observable A = {l)g + {—l)r that as- 
signs a value of 1 to the proposition asking whether a 
marble looks green and assigns a value of —1 to the 
proposition asking whether that same marble looks red 
in order to distinguish the colors by a sign. Alterna- 
tively, one can bestow a physical meaning to the color 
propositions by defining a wavelength observable instead: 
B — (550nm)(7 -t- (700nm)r. One could even define an 
observable C — {$2)g + (— $3)r that indicates a mone- 
tary bet made on the color of the marble, with $2 being 
awarded for a color of green and $3 being lost for a color 
of red. Such numerical labels are always assigned by 
convention, but indicate physically relevant information 
about the type of questions being asked by the experi- 
menter that are answerable by the independent proposi- 
tions. 

Representation. — The algebra Ex can be represented 
as the lattice of projection operators acting on a Hilbert 
space exactly as in the standard representation of quan- 
tum theory 0, [H, [lO]. The elements {x} of X cor- 
respond to rank-1 projection operators {|a::)(a;|} onto 
orthogonal subspaces spanned by orthonormal vectors 
{\x)} in the Hilbert space. Any sum of n elements of 
X , xi + ■ ■ ■ + Xn, corresponds to a rank-n projection op- 
erator \xi, . . . , Xn){xi, . . . ,Xn\ onto a subspace spanned 
by n orthonormal vectors {|a;i), . . . , |a;„)} in the Hilbert 
space. Hence, we shall casually refer to propositions of 
the Boolean algebra Ex as projections in what follows. 
However, it is important to note that the Boolean alge- 
bra Ex need not be represented in this fashion to be well 
defined. 

Just like the propositions Ex can be represented as 
projections on a Hilbert space, the observables E"^ can 
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also be represented as the algebra of Hermitian operators 
acting on the same Hilbert space. Hence, we shall casu- 
ally refer to observables in as observable operators in 
analogy to the quantum theory. However, unlike quan- 
tum observables, all classical observables commute. It is 
important to note that the representation of observables 
as operators on a Hilbert space in both the classical and 
the quantum case remains strictly optional for calcula- 
tional convenience. 

Independent Probability Observables. — We note that 
the identity observable Ix can be partitioned into many 
distinct sets of independent propositions in Ex, such 
as X)i — which is known as a closure relation. 
Each partitioning corresponds to a particular detector 
arrangement that only probes those propositions. Such 
a partitioning {xi} has the common mathematical name 
projection-valued measure (PVM) since it forms a mea- 
sure over the index i and has a representation that con- 
sists of orthogonal projections. However, we shall make 
an effort to call the propositions {xi\ independent prob- 
ability observables to be more physically descriptive. We 
will later contrast them with more general probability 
observables. 

General observables can be constructed from indepen- 
dent probability observables by associating a real value 
f{xi) to each index i in the sum, F = f{xi)xi. The 
product of the observable with any of its constituent 
probability observables simplifies, F Xi — f{xi)xi] hence, 
the associated values form the set of eigenvalues for the 
observable. For a finite observable space the set of 
atomic propositions X itself is a maximally refined set of 
independent probability observables that can construct 
any observable in the space, 

F^Y. /(^)^- (3) 

In the continuous case the values f{x) form a measur- 
able function that specifies the spectrum of the observ- 
able; the sum ([3]) is then commonly written as an in- 
tegral over the continuous set of propositions {|a;)(a;|}, 
F = f{x) d\x){x\. We use the Hilbert space notation 
(i|x)(a;| in the integral to avoid later confusion with real- 
valued integrals. 

B. States, densities, and collapse 

Probability measures. — A state P is a probability mea- 
sure over the Boolean algebra Sx, meaning that it is 
a linear map from Sx to the interval [0, 1] such that 
P{lx) = 1- Such a state P assigns a numerical value 
P{x) to each proposition x € Sx that quantifies its de- 
gree of plausibility; that is, P{x) formally indicates how 
likely it is that the question x would be answered "yes" 
were it to be answered, with 1 indicating a certain "yes" 
and indicating a certain "no." The value P{x) is called 
the probability for the proposition x to be true. Normal- 
izing P(lx) — 1 ensures that exactly one proposition in 



the sample space must be true. For continuous spaces, 
the state becomes an integral P{xq) = /^^g^x ^^(^)- 

Frequencies. — Empirically, one can check probabilities 
by repeatedly asking a proposition in Ex to identically 
prepared systems and collecting statistics regarding the 
answers. For a particular proposition x G Ex, the ratio 
of yes-answers to the number of trials will converge to the 
probability P{x) as the number of trials becomes infinite. 
However, the probability has a well-defined meaning as a 
plausibility prediction even without actually performing 
such a repeatable experiment. Indeed, designing good 
quality repeatable experiments to check the probabili- 
ties assigned by a predictive state is the primary goal of 
experimental science, and is generally quite difficult to 
achieve. 

Expectation Junctionals. — The linear extension of a 
state P to the whole observable algebra E"^ is an ex- 
pectation functional that averages the observables, and 
is traditionally notated with angled brackets ( • ) . Specif- 
ically, for an observable F — X^^ex •/'(^) ^' then, 

{F) = J2 f{x)P{x\ (4) 

xex 

is the expectation value, or average value, of F under 
the functional ( • ) that extends the probability state P. 
Since ( • ) is linear, it passes through the sum and the 
constant factors of f{x) to apply directly to the proposi- 
tions X. The restriction of ( • ) to Ex is P, so (x) = P{x) 
as written in (jl]). That is, the expectation value (x) of 
a pure proposition x is the probability of that proposi- 
tion. The probability state P and its linear extension ^ • ^ 
are illustrated in Fig. [TJ For continuous spaces the sum 
(III becomes an integral of the measurable function f{x), 
(P) = { Ix /(^) d\x){x\) = f{x) dP{x). 

Moments. — The n^^ statistical moment of F is = 
X^ksx /"(^)-f'(^) ^^'i empirically corresponds to mea- 
suring the observable F n times in a row per trial on iden- 
tical systems and averaging the repeated results. Hence, 
the moments quantify the fluctuations of the observable 
measurements that stem from uncertainty in the state. 
For continuous spaces, the higher moments also become 
integrals (F") = /"(x) 

Densities. — States can often be represented as densi- 
ties with respect to some reference measure from Ex to 
which can be convenient for calculational purposes. 
Just as the state P can be linearly extended to an ex- 
pectation functional ( • ), any reference measure /i can 
be linearly extended to a functional ( • For contin- 
uous spaces, such a reference functional takes the form 
of an integral {F) ^ = f{x)d^{x). The representation 
of a state as a density follows from changing the inte- 
gration measure for the state to the reference measure 
{F) = J^f{x)dP{x) = J^f{x){dP/dfl){x)d^l{x). The 
Jacobian conversion factor dP/dfj, from the integral over 
dP{x) to the integral over a different measure dfj,{x) is 
the probability density for P with respect to fi, if it ex- 
ists j71i] . We can then define a state density observable 
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= J-^{dP/dn){x) d\x){x\ that relates the expectation 
functional ^ • ) to the reference functional ( • )^ directly 
according to the relation {P^F) ^ = (i^)- 

For continuous spaces, the standard integral is most 
frequently used as a reference. Hence, the probability 
density with respect to the standard integral is given the 
simple notation p{x) such that (^F^ — f{x)p{x) dx. 
Importantly, the probability for x is not the density 
p{x) = {dP/dx){x), but is the (generally infinitesi- 
mal) integral of the density over a single point P{x) — 
J^^-^ p{x) dx [zl, [73, commonly notated p{x)dx. 

In discrete spaces we apply the same idea by defining 
a state density observable directly in terms of measure 
ratios, 



H{x) 



(5) 



Then by definition and linearity, ^P^i^^ = 

as required. Evidently, the measure /i must be nonzero 
for all propositions x for which P is nonzero in order for 
such a state density to be well defined. This definition 
as a ratio of functionals will correctly reproduce the 
Jacobian derivative in the continuous case using a 
limiting prescription. 

Trace. — An important reference measure which is 
nonzero for any nonzero proposition is the counting mea- 
sure, or trace Tr, which evaluates to the rank of any 
proposition in Ej^; for example, given x,y,z g X then 
(x+y+z) £ Tix is a rank-3 proposition and TT{x+y+z) = 
Tr(a;) + Tr(?;) + Tr(z) = 1 + 1 + 1 = 3. Since the trace eval- 
uates to unity on any atomic proposition, any state has a 
trace-density defined by equation ([5]) that is traditionally 
notated as p. 



P 



Y,P{x)x. 



(6) 



x^X 



The trace-density is the only state density that is always 
defined and exactly determined by the probabilities of 
the atomic propositions P{x). Because of this, the trace- 
representation of a state can be naturally interpreted as 
an inner product. 



(p,P) = Tr(pF) = (P), 



(7) 



between the trace-density and the observable, known as 
the Hilbert- Schmidt inner product. The trace will be- 
come particularly important when we generalize to quan- 
tum mechanics, which is why we mention it here. Indeed, 
the trace-density p will be equivalent to the quantum 
mechanical density operator when extended to the non- 
commutative case. For continuous spaces the integral is 
traditionally preferred to the trace as a reference because 
the trace can frequently diverge. 

State collapse. — If a question on the probability space 
is answered by some experiment, then the state indicat- 
ing the plausibilities for future answers must be updated 



to reflect the acquired answer. The update process is 
known as Bayesian state conditioning, or state collapse. 
Specifically, if a proposition y £ Ex is verified to be true, 
then the experimenter updates the expectation functional 
to the conditioned functional. 



in. = 



(8) 



that refiects the new information. For a proposition 
X e Ex, the conditional probability {x)^ = P{yx)/P{y) 

has the traditional notation P{x\y) and is read as "the 
probability of x given y." 

From ^ , any state density corresponding to P will be 
similarly updated to a new density via a product. 



^ fj-\y 



P{yy 



(9) 



Notably, conditioning the trace-density p on an atomic 
proposition y £ X will collapse the density to become 
the proposition itself, py — py/P{y) = y. 

Note that the proposition y serves a dual role in the 
conditioning procedure. First, it is used to compute the 
normalization probability P{y). Second, it directly up- 
dates the state via a product action. The product in- 
dicates that future questions will be logically linked to 
the answered question with the and operation; that is, 
the knowledge about the system has been refined by the 
answered question. The process of answering a question 
about the system and then conditioning the state on the 
new information is called a measurement; moreover, since 
the proposition y is a projection acting on the density, 
this kind of measurement is called a projective measure- 
ment. 

Bayes' rule. — If we pick another proposition z £ Ex 
as the observable in ([5]) we can derive Bayes' rule as a 
necessary consequence by interchanging y and z and then 
equating the joint probabilities P{yz), 



Piz\y) = P{y\z) 



p{yy 



(10) 



Bayes' rule relates conditioned expectation functionals 
to one another and so is a powerful logical inference tool 
that drives much of the modern emphasis on the logical 
approach to probability theory. 

Disturbance. — Conditioning, however, is not the only 
way that one can alter a state. One can also disturb 
a state without learning any information about it, which 
creates a transition to an updated expectation functional 
that we denote with a tilde according to. 







(11a) 


V{F) 




(lib) 




x&X 






= J2 fi^')Dx{x'). 

x'&X 


(11c) 
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Here the disturbance D is a raap from El- to T,^ that 
is governed by a collection of states {D^} that specify 
transition probabilities Dx{x') from old propositions x to 
new propositions x' . To be normalized, the transition 
states must satisfy Dx{lx) — 1, so that (ijx:)^, = Ijc 
and therefore I? (Ijc) = ^x- Updating the state according 
to (jlip is also known as Bayesian belief propagation [67l| 
and is more commonly written in the fully expanded form 

{F) = E.ex E.'ex D^{x')f{x'). 

Time evolution. — As an important special case, the 
time evolution of a Markovian stochastic process is a 
form of disturbance I?t, known as a propagator, that is 
parametrized by a time interval t. No information is 
learned as the system evolves, so the knowledge about 
the system as represented by the expectation functional 
can only propagate according to the laws governing the 
time evolution. For a Hamiltonian system, the time 
evolution is of Liouville form; that is, if we define a 
time-evolving observable as F{t) = 'Dt{F) then we have 
dF{t)/dt — {F{t),H}p, where {•, - jp is defined point wise 
as the Poisson bracket. The differential equation implic- 
itly specifies the form of the disturbance Vt . 

Correlation functions. — Correlations between observ- 
ables at different times can be obtained by inserting a 
time-evolution disturbance between the observable mea- 
surements, 

(F(O)G(t)) ^ (FA(G)), (12) 
= ^ Pix)f{x) J2 D^Ax')9i^')- 



x'ex 



Operationally this corresponds to measuring the observ- 
able F, waiting an interval of time t, then measuring the 
observable G. Similarly, n-time correlations can be de- 
fined with n—1 time-evolution disturbances between the 
observable measurements l^FiDt^ {F2 ■ ■ ■ 'Dt^_^{Fn) •••))■ 
Computing the correlation of an observable with itself at 
the same time will produce a higher moment (^'")- 

Invasive measurement. — A system may also be dis- 
turbed during the physical process that implements con- 
ditioning, which will alter the state above and beyond the 
pure conditioning expression ((S)). With such an invasive 
measurement, one conditions a state after a disturbance 
induced by the measurement process has occurred; hence, 
one obtains a new state. 



{-Diy)) ' 

i:.exn^)T.x'exDAy^')f(^') 
E.exPi^)DAy) 



(13) 



which is a composition of the measurement disturbance 
PTjl followed by the pure conditioning (|51). 

As we shall see later in Sec. IIII Bl the quantum pro- 
jection postulate (Liider's Rule) can be understood as an 
invasive measurement similar to (jl3p . but not as pure 
conditioning similar to ([5]). This observation has also 
been recently emphasized by Leifer and Spekkens [67| . 



who show that a careful extension of ([8]) to the non- 
commutative quantum setting does not reproduce the 
projection postulate. Hence, better understanding classi- 
cal invasive measurement should provide considerable in- 
sight into the quantum measurement process. However, 
to properly understand the implications of invasive mea- 
surements on the measurement of observables, we must 
consider the measurement process in more detail. 



C. Detectors and probability observables 

For a single ideal experiment that answers questions of 
interest with perfectly correlated independent outcomes, 
knowing the spectrum of an observable for that experi- 
ment is completely sufficient. However, in many (if not 
most) cases the independent propositions corresponding 
to the experimental outcomes are only imperfectly cor- 
related with the questions of interest about the system. 
Since in such a case one may not have direct access to the 
questions of interest, one also may not have direct access 
to the observables of interest. One must instead infer 
information about the observables of interest indirectly 
from the correlated outcomes of the detector to which 
one does have access. 

Joint sample space. — To handle this case formally, we 
first enlarge the sample space to include both the sam- 
ple space of interest, which we call the system, X and 
the accessible sample space, which we call the detector, 
Y. Questions about the system and the detector can 
be asked independently, so every question for the system 
can be paired with any question from the detector; there- 
fore, the resulting joint sample space must be a product 
space, XY — {xy\x e X, y e Y}, where the products 
of propositions from different sample spaces commute. 
The Boolean algebra J^xy and observable algebra 
are constructed in the usual way from the joint sam- 
ple space, and contain the algebras Sx, Ey, and 
Sf- as subalgebras. When represented as operators on a 
Hilbert space, the corresponding joint representation ex- 
ists within the tensor product of the system and detector 
space representations. 

Product states. — If the probabilities of the system 
propositions are uncorrelated with the probabilities of 
the detector propositions under a joint state P on the 
joint sample space, then the joint state can be writ- 
ten as a composition of independent states that are re- 
stricted to the sample spaces of the system and detector, 
P = Px o Py. Just as the state P has a linear extension 
to ( • ), its restrictions Px and Py have linear exten- 
sions ( ■ )^ and ( • ^y, respectively. Thus, for any joint 
observable F an uncorrelated expectation has the form 
{F) — ((F)^)^ = ((i^)^)^. Such an uncorrelated joint 
state is known as a product state. The name stems from 
the fact that for a simple product FxFy of system and 
detector observables the corresponding joint expectation 
decouples into a product of system and detector expec- 
tations separately, {FxFy) = {Fx) ^{Fy'jy. 
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Similarly, general measures on the joint sample space 
can be product measures. A particularly useful example 
is the trace Tr — Trx o Try on XY ^ which is composed 
of the partial traces, Tix and Try . The trace serves as a 
convenient reference measure since it is a product mea- 
sure for which any joint state has a corresponding den- 
sity. On continuous spaces the standard integral is also 
a product measure, (^F'j = J-^lJy f{x,y)p{x,y) dy] dx = 
JyUj^ f{x,y)p{x,y) dx] dy, which tends to have nonin- 
finitesimal densities. 

Correlated states. — In addition to product states, the 
joint space admits a much larger class of correlated states 
where the detector and system questions are dependent 
on one another. With such a correlated state a mea- 
surement on the detector cannot be decoupled in general 
from a measurement on the system. Information gath- 
ered from a measurement on a detector under a correlated 
state will also indirectly provide information about the 
system, thus motivating the term "detector." 

Reduced states. — For a pure system observable Fx or 
a pure detector observable Fy , the average under a joint 
state will be equivalent to the average under a state re- 
stricted to either the system or the detector space, known 
as a reduced state, or a marginalized state. We can define 
such a reduced state by using the joint state density un- 
der any reference product measure /i = fix ° l^Y , such as 
the trace Tr. It then follows that, 



{Fx) = {{P,)^^Fx)^^={P,^Fx) 

{FY) = {{P,),^Fy)^^={P,^Fy) 



(14a) 
(14b) 



The quantities P^^ = (P^i) ^ and P^^ = (Pfj.) ^ are 
the reduced state densities that define the reduced states 



Px and Py with expectation functionals, 

{Fx)^^{P,^Fx)^^, 

{Fy)Y = {P,,Fy) 



(15a) 
(15b) 



By definition, (Fx) = {Fx)x and (Fy) = {Fy)^. How- 
ever, in general (P) ^ ((P)^)^, (P) ^ {{F)^)y, and 
((P)y)^ 7^ {{F)j^)y unless P is a product state. The 
resulting reduced expectations ( ■ )^ and ( ■ )y are in- 
dependent of the choice of reference product functional 
H. 

Probability observables. — Any correlation between the 
system and detector in the joint state allows us to directly 
relate propositions on the detector to observables on the 
system. We can compute the relationship directly by 
using a closure relation and rearranging the conditioning 
procedure ([8]) to find. 



x£X xex 

(16) 

^y-E^(yi^)^- (17) 



The resulting set of system observables {Py} exactly cor- 
respond to the detector outcomes {y}. Analogously to a 
set of independent probability observables, they form a 
partition of the system identity, but are indexed by de- 
tector propositions rather than by system propositions, 
'^y£y Fy = Ix- Such a set {Py} has the common math- 
ematical name positive operator-valued measure (POVM) 
[ll[, since it forms a measure over the detector sample 
space Y consisting of positive operators. However, we 
shall make an effort to refer to them as general probabil- 
ity observables to emphasize their physical significance. 
As long as the detector outcomes are not mutually ex- 
clusive with the system, the probability observables PT)) 
will be a faithful representation of the reduced state of 
the detector in the observable space of the system. 

Process tomography. — The probability observables are 
completely specified by the conditional likelihoods P{y\x) 
for a detector proposition y to be true given that a sys- 
tem proposition x is true. Such conditional likelihoods 
are more commonly known as response functions for the 
detector and can be determined via independent detec- 
tor characterization using known reduced system states; 
such characterization is also known as detector tomogra- 
phy, or process tomography. Any good detector will then 
maintain its characterization with any unknown reduced 
system state. That is, a noninvasive coupling of such 
a good detector to an unknown system produces a cor- 
related joint state according to P{xy) ~ Px{x)P{y\x), 
where Px is the unknown reduced system state prior to 
the interaction with the detector. 

Generalized state collapse. — In addition to allowing the 
computation of detector probabilities, P(y) — (^Ey'^^, 
probability observables also have the dual role of updat- 
ing the reduced system state following a measurement on 
the detector. To see this, we apply the general rule for 
state collapse ([5]) for a detector proposition y on the joint 
state to find, 



(^x)y 



(yFx) ^ 
P{y) 

(EyFx)^ 



xGX 



fxix)P{y\x 



Pxjx) 

P{y) 



(18) 



xex 



X 

which can be seen as a generalization of the Bayesian 
conditioning rule ([5]) to account for the effect of an im- 
perfectly correlated detector, and can also be understood 
as a form of Jeffrey's conditioning [74]. For this rea- 
son, probability observables are commonly called effects 
of the generalized measurement. A reduced state density 
Pp^ for the system updates as P^ix\v — Pfj^x Fy/{Ey)^. 
Such a generalized measurement is nonprojective, so is 
not constrained to the disjoint questions on the sample 
space of the system. As a result, it answers questions on 
the system space ambiguously or noisily. 

Weak measurement. — The extreme case of such an am- 
biguous measurement is a weak measurement, which is 
a measurement that does not (appreciably) collapse the 
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system state. Such a measurement is inherently ambigu- 
ous to the extent that only a minuscule amount of in- 
formation is learned about the system with each detec- 
tion. Formally, the probability observables for a weak 
measurement are all nearly proportional to the identity 
on the system space. Typically, an experimenter has ac- 
cess to some control parameter e (such as the correlation 
strength) that can alter the weakness of the measurement 
such that, 



Vy, lim-Ey(e) 



PY{y)l 



X, 



(19) 



where Pviy) G (0,1) is the nonzero probability of ob- 
taining the detector outcome y in the absence of any in- 
teraction with the system. Then for small values of e the 
measurement leaves the system state nearly unperturbed, 
P^^\y = Pfj,^ Ey{e)/{Ey{e))^ « P^^. The limit as such 
a control parameter e — > is known as the weak mea- 
surement limit and is a formal idealization not strictly 
achievable in an experiment. 

Strong measurement. — The opposite extreme case is a 
strong measurement or projective measurement, which is 
a measurement for which all outcomes are independent, 
as in ([3]). In other words, the probability observables are 
independent for a strong measurement. The projective 
collapse rule can therefore be seen as a special case 
of the general collapse rule (|18p from this point of view. 

Measurement sequences. — A further benefit of the 
probability observable representation of a detector is that 
it becomes straightforward to discuss sequences of gener- 
alized measurements performed on the same system. For 
example, consider two detectors that successively couple 
to a system and have the outcomes y and z measured, 
respectively. To describe the full joint state of the sys- 
tem and both detectors requires a considerably enlarged 
sample space. However, if the detectors are characterized 
by two sets of probability observables {Ey} and {E'^} we 
can immediately write down the probability of both out- 
comes to occur as well as the resulting final collapsed 
system state without using the enlarged sample space. 



P{yz) = 



(E'^EyFx) 



X 



(20a) 
(20b) 



X 



Similarly, a conditioned density takes the form -P^^Ij^^ = 
P^^ E'^Eyl l^E'^Ey^ ^. The detectors have been abstracted 
away to leave only their effect upon the system of interest. 

Generalized invasive measurement. — The preceding 
discussion holds provided that the detector can be nonin- 
vasively coupled to a reduced system state Px to produce 
a joint state P{xy) = Px{x)P{y\x). However, more gen- 
erally the process of coupling a reduced detector state Py 
to the reduced system state Px will disturb both states 
as discussed for PT|) . The disturbance produces a joint 
state from the original product state of the system and 



detector according to, 

1^{xy)^ Y D^,^y,{xy)x'y', 



(21) 
(22) 



x'eXy'eY 



where Dx'.y' are states specifying the joint transition 
probabilities for the disturbance. The noninvasive cou- 
pling P{xy) = Px{x)P{y\x) is a special case of this where 
the reduced system state is unchanged by the coupling. 

As a result, we must slightly modify the derivation of 
the probability observables ([T5| to properly include the 
disturbance. 



{y) = my))y)x = {Ev)x^ 

Ey = (I?(2/))^, 

= E E PY{y')D..y'{y)x. 



(23a) 
(23b) 



x^Xy'£Y 



The modified probability observable Ey includes both 
the initial detector state Py and the disturbance from 
the measurement. Detector tomography will therefore 
find the effective characterization probabilities P{y\x) — 
Ey'eyDx,y'{y)Pyiy'). 

The generalized collapse rule similarly must be modi- 
fied to include the disturbance. 



{{V[yFx))y)^ {EyiFx)) 



X 



{Ey): 



my))y)x 

£y{Fx)^{V{yFx))y, 



(24) 
(25) 



x'£X y'eY 



xGX 



Surprisingly, we can no longer write the conditioning in 
terms of just the probability observables Ey] instead we 
must use an operation £y that takes into account both 
the coupling of the detector and the disturbance of the 
measurement in an active way. The measurement op- 
eration is related to the effective probability observable 
according to, £y{lx) — Ey. 

The change from observables to operations when the 
disturbance is included becomes particularly important 
for a sequence of invasive measurements. Consider an 
initial system state Px that is first coupled to a detector 
state Py via a disturbance then conditioned on the 
detector proposition y, then coupled to a second detector 
state Pz via a disturbance I?2, and finally conditioned 
on the detector proposition z. The joint probability for 
obtaining the ordered sequence (y, z) can be written as 



{{V,{y{V,{z))^))^)^^{£y{E'^)) 



' X' 



(26) 



The effective probability observable £y(£'^(\x)) — £y{E'^ 
for the ordered measurement sequence (y, z) is no longer 
a simple product of the probability observables Ey and 
i?^ as in (|20ap . but is instead an ordered composition of 
operations. 
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The ordering of operations also leads to a new form 
of postselected conditioning. Specifically, if we condition 
only on the second measurement of z in an invasive se- 
quence {y,z), we obtain, 



X 



'^^^^ j:y'eY{£y'iE'z))x ^ {^m)x 
v'eY 



(27) 
(28) 



The different position of the subscript serves to distin- 
guish the postselected probability ^(y) from the prese- 
lected probability (y)^ — (^£'^{Ey)) ^ / l^E'^) ^ correspond- 
ing to the reverse measurement ordering of {z,y). The 
operation £ appearing in the denominator is called a 
nonselective measurement since it includes the distur- 
bance induced by the measurement coupling, but does 
not condition on any particular detector outcome. When 
the disturbance to the reduced system state vanishes, 
the conditioning becomes order-independent and both 
types of conditional probability reduce to P(]j\z) — 

{EyK)xl{^'z)x- 

The two forms of conditioning for invasive measure- 
ments in turn lead to a modified form of Bayes' rule that 
relates the preselected conditioning of a sequence to the 
postselected conditioning of the same sequence. 



(Ey) 



x 



(29) 



X 



When the disturbance to the reduced system state van- 
ishes, the nonselective measurement £ reduces to the 
identity operation, ^(y) reduces to P{y\z), (z) re- 
duces to P(z|y), and we correctly recover the noninvasive 
Bayes' rule ((TI 



D. Contextual values 

Observable correspondence. — With the preliminaries 
about generalized state conditioning out of the way, we 
are now in a position to discuss the measurement of ob- 
servables in more detail. First we observe an important 
corollary of the observable representation of the detector 
probabilities P{y) = l^Ey'^ from ()16|) : detector observ- 
ables can be mapped into equivalent system observables, 

{Fy) = fY{y)P{y) = {Fx)x. (30) 

y£Y 

Fx^Y.^y^y)Ey (31) 

Note that the eigenvalues fx{x) — J2yeY fyiy)F{y\x) of 
the equivalent system observable Fx are not the same as 
the eigenvalues /^(y) of the original detector observable 
Fy, but are instead their average under the detector re- 
sponse. If the system propositions were accessible then 



the system observable Fx would allow nontrivial infer- 
ence about the detector observable Fy, provided that 
the probability observables were nonzero for all y in the 
support of Fy . 

Contextual values. — A more useful corollary of the ex- 
pansion pip is that any .system observable that can be ex- 
pressed as a combination of probability observables may 
be equivalently expressed as a detector observable. 



Fx = J2 /^(y) Ey =^ Py^J2 f^iy) y. 



(32) 



yeY 



which is the classical form of our main result. Using 
this equivalence, we can indirectly mea.sure such sy.stem 
observables using only the detector. We dub the eigen- 
values of the detector observable fy{y) the contextual 
values (CVs) of the system observable Fx under the 
context of the specific detector characterized by a spe- 
cific set of probability observables {Fy}. The CVs form 
a generalized spectrum for the observable since they are 
associated with general probability observables for a gen- 
eralized measurement and not independent probability 
observables for a projective measurement; the eigenval- 
ues are a special case when the probability observables 
are the spectral projections of the observable being mea- 
sured. 

With this point of view, we can understand an ob- 
servable as an equivalence class of possible measure- 
ment strategies for the same average information. That 
is, using appropriate pairings of probability observables 
and CVs, one can measure the same observable aver- 
age in many different ways, (Fx) = J2xex fx{x)P{x) = 
^y£Y fY{y){Ey) ^. Each such expansion corresponds to 
a different experimental setup. 

Moments. — Similarly, the n"^ statistical moment of an 
observable can be measured in many different, yet equiva- 
lent, ways. For instance, the n**^ moment of an observable 
Fx can be found from the expansion ([5^ as. 



((^^r) = ((E/>'(y)^'^)")x' 



(33) 



yeY 



E lY{yi)---fY{yn){Ey,---Ey^) 

yi,---,yn£Y 



X' 



By examining the general collapse rule for measure- 
ment sequences (|20al) we observe that the quantity 
{Ey-^ ■ ■ ■ Ey^^ must be the joint probability for a se- 
quence (j/i, • • • ,y„) of n noninvasive measurements that 
couple the same detector to the system n times in suc- 
cession. Furthermore, the average in (1331) is explicitly 
different from the n"^ statistical moment of the raw de- 
tector results, ((Fy)") = Y.yeYUY{y)r P{y)- 

We conclude that, for imperfectly correlated noninva- 
sive detectors, one must perform measurement sequences 
to obtain the correct statistical moments of an observable 
using a particular set of CVs. Only for unambiguous 
measurements with independent probability observables 
do such measurement sequences reduce to simple powers 
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of the eigenvalues being averaged with single measure- 
ment probabilities. If a single measurement by the de- 
tector is done per trial, then only the statistical moments 
of the detector observable Fy can be inferred from that 
set of CVs, as opposed to the true statistical moments of 
the inferred system observable Fx- 

We can, however, change the CVs to define new ob- 
servables that correspond to powers of the original ob- 
servable, such as Gx — (-Fx)" = J2yeY 9yiy)^v These 
new observables can then be measured indirectly using 
the same experimental setup without the need for mea- 
surement sequences. The CVs gyiu) for the n}^ power of 
Fx will not be simple powers of the CVs /y(y) for Fx 
unless the measurement is unambiguous. 

Invasive measurements. — If the measurement is inva- 
sive, then the disturbance forces us to associate the CVs 
with the measurement operations {£y} and not solely 
with their associated probability operators {Ey} in or- 
der to properly handle measurement sequences as in (I25p . 
Specifically, we must define the observable operation, 

:fx = Y. fyiy)^y^ (34) 

which produces the identity Txi^x) = J2yeY fy{y)-^y ^ 
Fx similar to ([5^ . 

Correlated sequences of invasive observable measure- 
ments can be obtained by composing the observable op- 
erations, 

{{Fxr{\x))^= fYiyi)---fY{yn)x 

VI, ■■■■Vn 

{£yA£yA---{Ey„)---)))^. (35) 

Such an n-measurement sequence reduces to the n"^ mo- 
ment p3p when the disturbance vanishes. 

If time evolution disturbance Vt is inserted between 
different invasive observable measurements, then we ob- 
tain an invasive correlation Junction instead, 

{Fxmxit)) = {TxmGxilx))))^. (36) 

When the observable measurements become noninvasive, 
then this correctly reduces to the noninvasive correla- 
tion function (jl2[) . Similarly, n-time invasive correla- 
tions can be defined with n — \ time-evolution distur- 
bances between the invasive observable measurements 

(j-l(A,(J-2(---A„_,(J-„(lx)) •••)))). 

Conditioned averages. — In addition to statistical mo- 
ments of the observable, we can also use the CVs to con- 
struct principled conditioned averages of the observable. 
Recall that in the general case of an invasive measure- 
ment sequence we can condition the observable measure- 
ment in two distinct ways. If we condition on an outcome 
z before the measurement of Fx we obtain the preselected 
conditioned average (Fx)^ defined in dH]). On the other 
hand, if the invasive conditioning measurement of z hap- 
pens after the invasive observable measurement then we 



must use the postselected conditional probabilities ([27)1 
to construct a postselected conditioned average, 

AFx)^Y.fy^y)M^ (37) 

yeY 

EyeYfYiy){£y{K))x {W'.))x 

j:yeY{^yiE'J)x {^(E'.)) 

The observable operation J^x and the nonselective mea- 
surement £ encode the relevant details from the first mea- 
surement. When the disturbance to the reduced system 
state vanishes, both the preselected and the postselected 
conditioned averages simplify to the pure conditioned av- 
erage (Fx) ^ defined in (fT8| that depends only on the 
system observable Fx. 

While the pure conditioned average {Fx) ^ is inde- 
pendent of the order of conditioning and is always con- 
strained to the eigenvalue range of the observable, the 
postselected invasive conditioned average ^(Fx) can, 
perhaps surprisingly, stray outside the eigenvalue range 
with ambiguous measurements. The combination of the 
amplified CVs and the disturbance can lead to a posts- 
elected average that lies anywhere in the full CV range, 
rather than just the eigenvalue range. We will see an 
example of this in Sec. HID 21 

Inversion. — So far we have treated the CVs in the ex- 
pansion (|32p as known quantities. However, for a realistic 
detector situation, the CVs will need to be experimen- 
tally determined from the characterization of the detec- 
tor and the observable that one wishes to measure. The 
reduced system state Fx will generally not be known a 
priori, since the point of a detector is to learn information 
about the system in the absence of such prior knowledge. 
We can still solve for the CVs without knowledge of the 
system state, however, since the probability observables 
are only specified by the conditional likelihoods P(y\x) 
that can be obtained independently from detector tomog- 
raphy. 

To solve for the CVs when the system state is presumed 
unknown, we rewrite p2p in the form, 

= E^(^i^)/^(y)' (38) 

x&X ySiY 

= E^(^>'). = '5(^^)' 

x£X 

where S = ^ {') x map that converts observ- 

ables in the detector space to observables in the system 
space S : Sy — )■ Y^\. Our goal is to invert this map and 
solve for the required spectrum of Fy given a desired sys- 
tem observable Fx. However, the inverse of such a map 
is not generally unique; for it to be uniquely invertible it 
must be one-to-one between system and detector spaces 
of equal size. If the detector space is smaller than the 
system, then no exact inverse solutions are possible; it 
may be possible, however, to find course-grained solu- 
tions that lose some information. Perhaps more alarm- 
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ingly, if the detector space is larger than the system, then 
it is possible to have an infinite set of exact solutions. 

When disturbance is taken into account as in p3|) , the 
equality ([55]) becomes, 

Fx = {V{Fy))^=S{Fy), (39) 

so the composition of the disturbance V and the detector 
expectation ( • )y produces the map S that must be in- 
verted. Equation (p8)) is a special case when the reduced 
system state is unchanged by the coupling disturbance. 

Pseudoinversion. — The entire set of possible solutions 
to ([M)) may be completely specified using the Moore- 
Penrose pseudoinverse of the map iS, which we denote as 
5+ . The pseudoinverse is the inverse of the restriction of 
S to the space i;^\{F e | S{F) = 0}; that is, the null 
space of S is removed from the detector space before con- 
structing the inverse. We will show a practical method 
for computing the pseudoinverse using the singular value 
decomposition in the examples to follow. 

Using the pseudoinverse, all possible solutions of (|39)) 
can be written compactly as, 

Fy =S+{Fx) + {I-S+S){G), (40) 

where I is the identity map and G £ Ey is an arbi- 
trary detector observable. The solutions specified by the 
pseudoinverse in this manner contain exact inverses and 
course-grainings as special cases. 

Detector variance. — Since (I — S^S) is a projection 
operation to the null space of S, the second term of PO)) 
lives in the null space of S and is orthogonal to the first 
term. Therefore, the norm squared of Fy has the form, 

11^^1-11' = E(/^'(y))'' (41) 

y 

= \\S+{FxW + \\{I-S+S){G)\\\ 

making the G = solution have the smallest norm. 

The norm ||Fy|| of the CV solution is relevant be- 
cause the second moment of the detector observable 
Fy is simply bounded by the norm squared ((^V)^) = 
P(2/)(/y(j/))^ < ||Fy|p. The second moment is sim- 
ilarly an upper bound for the variance of the detector 
observable Var(Fy) = ((Fy)^) - ((Fy))^ < {{Fy)^). 
Therefore, the norm squared is a reasonable upper bound 
for the detector variance that one can make without prior 
knowledge of the state. 

Mean-squared error. — The variance of Fy governs the 
mean-squared error of any estimation of its average with 
a finite sample, such as an empirically measured sample 
in a laboratory. Specifically, one measures a sequence of 
detector outcomes of length n, (j/i, ?/2, ■ • ■ , 2/n), and uses 
this finite sequence to estimate the average of Fy via the 
unbiased estimator, 

-I " 

Fy = -E/y(2/,), (42) 
11 ^ — ^ 



that converges to the true mean value {Fy)y = {^x) 
as n — >■ oo. The mean squared error of this estimator 
MSE(Fy) from the true mean is the variance over the 
number of trials in the sequence Var(Fy)/ri. Hence, the 
maximum mean squared error for a finite sequence of 
length n must be bounded by the norm squared of the 
CVs divided by length of the sequence, 

MSE(TV) = ^^<^. (43) 
n n 

That is, the norm bounds the number of trials necessary 
to obtain an experimental estimation of observable aver- 
ages to a desired precision using the imperfect detector. 

Pseudoinverse prescription. — Choosing the arbitrary 
observable to be G = therefore not only picks the so- 
lution Fy = S^{Fx) that is uniquely related to Fx by 
discarding the irrelevant null space of S, but also picks 
the solution with the smallest norm, which places a rea- 
sonable upper bound on the statistical error. Without 
prior knowledge of the system state, the pseudoinverse 
solution does a reasonable job at obtaining an optimal fit 
to the relation p9)) . Moreover, when (|39)) is not satisfied 
by the direct pseudoinverse then an exact solution is im- 
possible, but the pseudoinverse still gives the "best fit" 
coursegraining of an exact solution in the least-squares 
sense. As such, we consider the direct pseudoinverse of 
Fx to be the preferred solution in the absence of other 
motivating factors stemming from prior knowledge of the 
state being measured. 

1. Example: Ambiguous marble detector 

As an illustrative example similar to the one given 
in the introduction, suppose that one wishes to know 
whether the color of a marble is green or red, but one is 
unable to examine the marble directly. Instead, one only 
has a machine that can display a blue light or a yellow 
light after it examines the marble color. In such a case, 
the marble colors are the propositions of interest, but 
the machine lights are the only accessible propositions. 
The lights may be correlated imperfectly with the mar- 
ble color; for instance, if a blue light is displayed one may 
learn something about the possible marble color, but it 
may still be partially ambiguous whether the marble is 
actually green or actually red. 

The relevant Boolean algebra for the system is = 
{0, g,r,lx}, where g is the proposition for the color 
green, r is the proposition for the color red, and Ix = 
g + r is the logical or of the two possible color proposi- 
tions. We consider the task of measuring a simple color 
observable Fx = (+1)5 + (—1)^ that distinguishes the 
colors with a sign using an imperfectly correlated detec- 
tor. 

The relevant Boolean algebra for the detector is Sy = 
{0, 6, y, ly}, where b is the proposition for the blue light, 
y is the proposition for the yellow light, and ly — 5 + 
y. In order to measure the marble observable Fx using 
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only the detector, the experimenter must determine the 
proper form of the corresponding detector observable Fy ■ 
First, the experimenter characterizes the detector by 
sending in known samples and observing the outputs of 
the detector. After many characterization trials, the ex- 
perimenter determines to some acceptable precision the 
four conditional probabilities. 



P{b\g) = 0.6, 
P{b\r) = 0.2, 



P{y\9) = 0.4, 
P{y\r)^ 0.8, 



(44a) 
(44b) 



for the detector outcomes b and y given specific marble 
preparations g and r. These characterization probabil- 
ities completely determine the detector response in the 
form of its probability observables (|17|) . 

Eb ^ P{b\g)g + P{b\r)r, (45a) 
Ey = Piy\g)g + P{y\r)r. (45b) 

By construction. Eh + Ey = g + r = Ix- 

Second, the experimenter expands the system observ- 
able Fx using the detector probability observables p5)) 
and unknown contextual values (CVs) fyib) and fviv) 



2. Example: Invasive ambiguous detector 

The detector apparatus in the last example could be 
generally invasive. In such a case, the characterization 
probabilities p4|) composing the probability observables 
would be a combination of the initial state of the 
detector lights Py and a disturbance V from the mea- 
surement coupling according to 



(51a) 
(51b) 
(51c) 
(51d) 



P{b\g) = Py{b){DgAgb) + DgArb)) 

+ Pyiy)iDg,yigb)+DgArb)), 

P{y\g) = Py{b){DgAgv) + DgAry)) 

+ Py(y)(Dg,y{gy) + Dg,y{ry)), 

Pib\r) ^ Py{b){Dr.b{gb)+Dr.b{,rb)) 

+ Py{y){Dr,y{gb) + Dr.yirb)), 

P{y\r) = PYmDr,b{gy) + Dr,b{ry)) 

+ Py{y){Dr.y{gy) + Dr.,y{ry)). 



where we have used the marginalization identity 
Dc,d{b) = DcAgb) + Dc,d{rb) for c G {g,r) and d e 
{b,y}. For a noninvasive detector, the transition prob- 
abilities that involve marbles changing color must be 



Fx = {+l)g+{-l)r = fy{b)Eb + fyiy)Ey. (46) zero Dg^rb) = DgAry) = Dg^ry) = Dg^rb) 



After expressing this relation as the equivalent matrix 
equation. 



fP{b\g) P{y\g)\ f fy{b) 
[P{b\r) P{y\r)J [fyiy) 



it can be directly inverted to find the CVs pO|) . 

fy{b) = 3, fy{y) = -2. 

Therefore, 

Fx = (+1).9 + (-l)r i3)Eb + {-2)Ey, 



(47) 



(48) 



(49) 



DrAgb) = DrAgy) = DrAgb) = Dr,y{gy) = O. How- 
ever, they need not be zero for a general invasive detector. 

As an example, suppose that the initial detector state 
is unbiased, Pyib) — Py{y) = 1/2, and that the detector 
has a 10% chance of flipping the color of a given marble. 
The following possible values for the sixteen transition 
probabilities would then lead to the same effective char- 
acterization probabilities (l44t as before. 



so Fx can be inferred from a measurement of the equiv- 
alent detector observable Fy = (3)fo -I- {—2)y. 

Notably, the CVs are amplified from the eigen- 
values of ±1 due to the ambiguity of the detector. The 
amplification compensates for the ambiguity so that the 
correct average can be obtained after measuring an en- 
semble of many unknown marbles described by the initial 
marble state Px ■ The amplification also leads to a larger 
upper bound for the variance (|4ip of the detector. 



DgAgb) 


= 0.5 


DgAa b) 


= 0.5, 


(52a) 


DgAgy) 


= 0.3 


DgAgy) 


= 0.3, 


(52b) 


DrArb) 


= 0.1 


DrA^b) 


= 0.1, 


(52c) 


DrA^y) 


= 0.7 


Dr,y{ry) 


= 0.7, 


(52d) 


DgArb) 


= 0.1 


DgArb) 


= 0.1, 


(52e) 


DgAry) 


= 0.1 


Dg,y{ry) 


= 0.1, 


(52f) 


DrAa b) 


= 0.1 


Dr,y{9 b) 


= 0.1, 


(52g) 


DrAgy) 


= 0.1 


DrAgy) 


= 0.1. 


(52h) 



\Fy\? 



13. 



(50) 



Hence, we can expect the imperfect detector to display 
a root-mean-squared (RMS) error in the reported 
average color that is no larger than ■y/13/n « i.Q/An 
after n repeated measurements. For contrast, a per- 
fect detector would display an RMS error no larger than 
\f2Jn sa 1.4/-y/n after n repeated measurements. 



Since the effective characterization probabilities are the 
same, the probability observables are the same as (I45p . 
leading to the same CVs as to measure the observ- 
able Fx = (+1)5+ (-l)r. 

The disturbance of the reduced marble state will be- 
come apparent only when making a second measurement 
after the first one. Suppose we make a second mea- 
surement of the marble colors g and r directly. The 
probability of obtaining a detector outcome d G {5, 
and then observing a specific marble color c G {g, r} 
wiU then be Px[a){PYi^b)DgAcd) + Py{y)DgAcd)) + 
Px(T){Py(b)DrAc-d) ^ Py{y)DrAcd))- If we define an 
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operation as in (j25p to be, 



(53) 



= g iPY{b)Dg^t{cd) + PY{y)DgJcdj) 
+ r {PY{b)Dr,b{cd) + PY{y)Dr,y{cd)), 



then we can express the probabihty for the sequence com- 
pactly as (Sdic))^. 

Averaging the outcomes for the detector hghts using 
the CVs ([35]) and then conditioning on a particular mar- 
ble color c in the second measurement produces a posts- 
elected conditioned average of the marble colors ([37]) as 
reported by the invasive ambiguous detector, 



fY{h){£b{c))^ + fY{y){£y{c)), 



(54) 



If we also preselect the marbles to be a particular color, 
we can compute the pre- and postselected conditioned 
averages of the marble colors as reported by the invasive 
ambiguous detector from ([35]), ([5^ . and ([5^ . 



,(^),- 1-125, 
.(^x),=0.5, 

,{Fx), = -1.375. 



(55a) 
(55b) 
(55c) 
(55d) 



Due to a combination of the invasiveness and the ambi- 
guity of the measurement, the postselected conditioned 
averages can stray outside the eigenvalue range [—1,1] 
for the observable Fx ■ However, they remain within the 
CV range [—2,3]. When the measurement is noninva- 
sive, then the pre- and postselected conditioned averages 
in (|55p that remain well-defined reduce to the pure con- 
ditioned averages {Fx) = 1 and (Fx)^ = — 1. 



3. Example: Redundant ambiguous detector 

Consider a similar marble detection setup to the pre- 
vious examples, but where the detector apparatus has 
three independent outcome lights: blue, yellow, and 
purple. Hence, the detector Boolean algebra is Ey = 
{0,b,y,p,b + y, b+p,y+p, ly}, where p is the new propo- 
sition for the purple light, and 1y = b + y+p. After char- 
acterizing the detector the experimenter finds the condi- 
tional probabilities. 



P{b\r) 



0.5, 
0.1, 



Piy\9) 

P{y\r) 



0.3, 
0.7, 



Pip\g) 

P{p\r) 



0.2, 
0.2, 



that define the probability observables. 



Ei, = P{b\g)g- 
Ey^P{y\g)g 
Ep = P{P\9)9 - 



P{b\r)r, 
'P{y\r)r, 
■ P(j)\r)r. 



(56a) 
(56b) 



(57a) 
(57b) 
(57c) 



By construction, Eb + Ey + Ep = Ix- Furthermore, 
Ep ~ (0.2)lx, so the purple outcome cannot distinguish 
whether the marble is green or red and can be imagined 
as a generic detector malfunction outcome. 

The experimenter now has a choice for how to assign 
CVs to a detector observable Fy in order to infer the 
marble observable Fx — {+l)g+ {~l)r. A simple choice 
is to ignore the redundant (and nondistinguishing) purple 
outcome by zeroing out its CV /y (p) = 0, and then invert 
the remaining relationship analogously to (|47|) to find 
/y(6) = 3.125 and /y(y) = —1.875. The variance bound 
for this simple choice is ||-Fy||^ = 13.2813, leading to a 
root-mean-squared error no larger than ■y/13.2813/n 
3.6/y^ after n repeated measurements. 

However, a better choice is to find the preferred values 
for all three outcomes using the pseudoinverse (|40p of the 
map between _Fy and Fx- To do this, we write a matrix 
equation similar to ([Tf]) that uses all three outcomes. 



S 



fribT 
fviy), 

P(6|.g) Piy\g) Pip\g)\ 
P{b\r) P{y\r) P{p\r)) 



(58a) 
(58b) 



The pseudoinverse can be constructed by using the 
singular value decomposition, S = UY^V^ , where U is an 
orthogonal matrix composed of the normalized eigenvec- 
tors of SS'^ , V is an orthogonal matrix composed of the 
normalized eigenvectors of S^S, and E is a diagonal ma- 
trix composed of the singular values of S (which are the 
square roots of the eigenvalues of SS^ and S^S). After 
computing the singular value decomposition, the pseu- 
doinverse can be constructed as S'^ — VYj^U^ , where 
S+ is the diagonal matrix constructed by inverting all 
nonzero elements of . Performing this inversion we 
find the following preferred CV, 
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-7 








) 


3 






+l\ 




5 




1 = 


18 



(59a) 





( 3.05 " 


v) - ' 


-1.94 




\ 0.27 



(59b) 

This preferred solution has the smallest variance bound 
of ||Fy||2 = 13.1944. 

We find (perhaps counterintuitively) that even though 
the purple outcome itself cannot distinguish the marble 
color, the fact that one obtains a purple outcome at all 
still provides some useful information to the experimenter 
due to the asymmetry of the blue and yellow outcomes. 
Indeed, if for the red marble we instead found the sym- 
metric detector response P{b\r) = 0.3, P{y\r) = 0.5, and 
P{p\r) = 0.2, the pseudoinverse would produce the pre- 
ferred CVs /y(6) = 5, /y(y) = -5, and /y(p) = 0, 
indicating that the purple outcome was truly noninfor- 
mative. 
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A less principled approach to solving (|58)) would be for 
the experimenter to assign a completely arbitrary value 
to one outcome, like = B. The CV relation still 

produces a matrix equation, 



-1 - BP{b\g)\ 
-1 - BP{b\r) ) 



(P{y\g) P{p\g)\ 
\P{y\r) P{p\r)) 



fviy) 

fvip) 



(60) 



that can be solved to find. 



/y(y)=B-5, /y(p) = 12.5-4B. (61) 

The bound for the variance of this solution is ||Fy|P — 
18^2 - 1105 + 181.25 > 13.1944; the value of B that 
minimizes the bound is B ^ 3.05, which recovers the 
pseudoinverse solution. 

Although picking an arbitrary solution gives mathe- 
matically equivalent results, the experimenter will only 
increase the norm of the solution without any physical 
motivation. As such, the higher moments of the detector 
observable Fy can be correspondingly larger, and more 
trials may be necessary for the estimated average of the 
system observable Fx to reach the desired precision. 



4- Example: Continuous Detector 

Consider the extreme example of a marble color de- 
tector that has a continuum of outcomes, such as the 
position of impact of a marble on a continuous screen. 
In such a case, the detector sample space Y is indexed 
by a real parameter j/ £ M, and the relevant Boolean al- 
gebra Ey can be chosen to be the set of all Borel subsets 
of the real line 

After characterizing the detector, the experimenter 
finds that the detector displaces its initial probability dis- 
tribution dPyiy) — PY{y)dy by an amount z from the 
zero-point according to which marble-color is sent into 
the detector, 

dP{y\g)^dPY{y- z), dP{y\r) = dPyiy + z). (62) 

These probabilities define the probability observables, 

dE{y)=gdP{y\g) + rdP{y\r), (63) 

such that J^dE{y) = Ix- 

To infer information about the marble observable Fx 
using this detector, the experimenter must assign a con- 
tinuum of CVs fviy) such that. 



Fa- -(+l).g + (-l)r 



fY{y)dE{y), (64) 



or in matrix form 

-1 



-1 



= S[fY] = 



LfY{y)dPYiy-z] 
WfY{v)dPY{y + z] 



(65) 



Since /y is a function, 5 is a vector-valued functional, 
which is why we adopt the square-bracket notation. 



In this case, the detector outcomes are overwhelmingly 
redundant. However, we can pick the least norm solution 
using the pseudoinverse of the map S as before. To do 
so, we first calculate SS'^ , 



S'^ = {PY{y " z) pviy + z)) 



SS'^ = ( ^^^^ 
\b{z) a 



(66a) 
(66b) 



where, 



a = / PY{y)dPY{y) ^ / PY{y)dy, (67a) 
Kz)= I PY{y + z)pY{y - z)dy, (67b) 



and we find its eigenvalues of a + 6(z) with corresponding 
normalized eigenvector (1, l)/-\/2 and a — h{z) with cor- 
responding normalized eigenvector (— l,l)/'\/2. We can 
then construct the orthogonal matrix lA composed of the 
normalized eigenvectors of and the diagonal matrix 
S composed of the square roots of the eigenvalues oiSS^ ^ 



^ ^ / ^Ja + b{z) 

^Ja-b{z) 



(68) 
(69) 



Next we calculate the relevant eigenfunctions of S'^S 
that correspond to the same nonzero eigenvalues a± b{z) 
of SS'^] the remaining eigenfunctions belong to the 
nullspace of S and do not contribute. Specifically, we 
have, 



S'^S[h]{y)^PY{y~z) f h{y)dPY{y-z) 



+ PY{y + z) / h{y)dPY{y + z), 



(70) 



where h is an arbitrary function. Then the equations, 

S^S[v+]{y)={a + b{z))v+{y), (71a) 



S^S[v.]{y)={a-b{z))v^{y), 
define the normalized eigenfunctions, 

PY{y - z)+pY{y + z) 



v+{y) 



Vna + b{z)) 



(71b) 



(72a) 



V2(a- 6(z)) 

which allows us to construct the relevant part of the or- 
thogonal map V^, 

V^[h] = {Jv+{y)h{y)dy J v.{y)h{y) dy) , (73) 

completing the nonzero part of the singular value decom- 
position oi S = WSV^ . 
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Finally, we construct the pseudoinverse, 



III. QUANTUM PROBABILITY THEORY 



(74) 



■u-(y) 



(y) 



^/2{a+b{z)) ^2(a-b(z)) ^2(a+b(z)) ^2{a-b(z)) ) ' 



and solve for the CV, 



fy{y)=s- (75) 



a — h{z) 



where a and b{z) are as defined in ([57|) . 

The pseudoinverse solution (I75p contains only the 
physically relevant detector state density py and pro- 
vides direct physical intuition about the detection pro- 
cess. Namely, everything in the shifted distribution cor- 
responding to the green marble pviy — z) is associated 
with the eigenvalue -1-1, while everything in the shifted 
distribution corresponding to the red marble py(y -I- z) is 
associated with the eigenvalue —1. The overall amplifi- 
cation factor a — b{z) indicates the discrepancy between 
the overlap of the shifted distributions and the distribu- 
tion autocorrelation. The more the shifted distributions 
overlap, the more ambiguous the measurement will be, 
so the amplification factor makes the CVs larger to com- 
pensate. If the shifted distributions do not overlap, then 
b{z) — and the only amplification comes from the auto- 
correlation a that indicates the ambiguity of the intrinsic 
profile of the detector state. Moreover, the support of 
the CVs is equal to the support of both shifted detector 
distributions, which is physically satisfying. 

The bound for the detector variance using the pseu- 
doinverse solution is ||/y |P = 2/[a — b{z)], which depends 
solely on the amplification factor in the denominator. If 
the measurement is strong, such that a — b{z) = 1, then 
the variance bound reduces to the ideal variance bound 
of 2, as expected, leading to a maximum RMS error of 
^2jn. Any additional ambiguity of the measurement 
stemming from distribution overlap or distributed auto- 
correlation amplifies the maximum RMS error by a factor 
of ^\l{a-b{^z)\. 

Contrast these preferred values with the generic lin- 
ear solution = yjz, which also satisfies ([M]) when 
Py is symmetric about its mean [l3, [H, |4§| . While the 
generic solution could be argued to be simpler in form, 
it provides no information about the detector and pro- 
vides no physical insight into the meaning or origin of the 
values themselves. It has nonzero support in areas where 
the detector has zero support and even gets progressively 
larger in regions that will not contribute to the average. 
Moreover, the bound for the detector variance diverges, 
indicating that the RMS error can in principle be un- 
bounded. Hence, despite the mathematical equivalence, 
the linear solution is physically inferior as a solution when 
compared to the pseudoinverse (j75p . 



To transition from the classical theory of probability 
to the quantum theory we shall take a somewhat uncon- 
ventional approach that leverages what we have already 
derived in the classical theory. Specifically, we shall con- 
struct the quantum theory as a superstructure over the 
existing classical theory, rather than developing it as an 
independent logical system HJTOI or as a restriction of 
a larger classical theory f&j. |65| . This approach serves 
to illustrate the myriad similarities between the quan- 
tum and classical theories, while also highlighting their 
key differences. We shall see that the contextual- value 
formalism is essentially unchanged, despite the modifi- 
cations that must be made to the operational theory of 
measurement. 



A. Sample spaces and observables 

Quantum sample space. — The quantum theory of prob- 
ability forms a superstructure on the classical theory of 
probability in the following sense: given a classical sam- 
ple space X, the corresponding quantum sample space 
can be obtained as the orbit of X under the action of 
the special unitary group of rotations. That is, the en- 
tire classical sample space X can be rotated to a different 
classical sample space X' = U{X) with some special uni- 
tary rotation U. We call each classical sample space gen- 
erated in this fashion a framework to be consistent with 
other recent work [tJ- The collection of all such contin- 
uously connected classical sample spaces is the quantum 
sample space, which we will notate as Q{X) to emphasize 
that it can be generated from X. 

Representation. — If the sample space X is represented 
as a set of orthogonal rank-1 projections on a 

Hilbert space, the rotated sample space X' — U{X) will 
be represented by a different set of orthogonal projections 
{U{\x) on the same Hilbert space. Any such rotation 
lA can be given a spinor representation (see, e.g., (TSl - lsoj ) 
as a two-sided product with a rotor U belonging to the 
special unitary group, such that WU — UW — Ix, and 
{U^)^ — U. The involution (^) is the adjoint with re- 
spect to the inner product of the Hilbert space. While 
the projections {|a;) {x\} correspond to subspaces spanned 
by vectors {\x)} in the Hilbert space, the rotated projec- 
tions {U^x){x\U} correspond to subspaces spanned by 
rotated vectors {C/t|x)}. In what follows we shall tend to 
use the shorter algebraic notation x and adopt the equiv- 
alent Hilbert space notation |x)(a;| only when it readily 
simplifies expressions. 

Since the Hilbert space representation of a unitary ro- 
tor U generally contains complex numbers in order to 
satisfy the special unitary group relations, the Hilbert 
space also becomes complex. However, it is important 
to note that the complex structure arises solely from the 
representation of the unitary rotations that specify the 
relative framework orientations and will not appear di- 
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rectly in any calculable quantity to follow 'sil]. The rep- 
resentation of the quantum sample space Q{X) therefore 
consists of all possible rank-1 projections on the complex 
Hilbert space in which the classical sample space X is 
represented. 

Quantum observables. — Each classical framework X 
has an associated Boolean algebra 'Sx and space of ob- 
servables E"^ exactly as previously discussed. The space 
of quantum observables is the collection of all classi- 
cal observables that are independently constructed in all 
the classical frameworks in Q{X). We will denote this 
space as Sg^.^^. Quantum observables are therefore con- 
structed entirely with real numbers that have empirical 
meaning for a laboratory setting; hence, their represen- 
tations on a complex Hilbert space will be Hermitian 
operators. 

For observables in the same framework A,Be S*, 
we find that U{A)U{B) ^ W AUW BU = W ABU = 
U{AB), meaning that the rotations preserve their alge- 
braic structure. As a corollary, all observables in Eg^^^ 
can be obtained by rotating observables constructed in 
a single framework hence, our previous discussion 
of observables carries over to the quantum theory essen- 
tially unaltered. 

Furthermore, the independence of the propositions in 
a framework X remains unaltered by unitary rotation, 
so every other framework X' has the same number of 
independent propositions. Thus, the number of indepen- 
dent propositions is an invariant known as the quantum 
dimension] for a representation it fixes the dimension of 
the Hilbert space. Similarly, the identity and zero observ- 
ables are invariants, so are the same in every framework 
and unique in the quantum observable algebra. 

Since each different framework forms a separate well- 
behaved classical sample space, the entire preceding dis- 
cussion about classical probability theory applies unal- 
tered when restricted to a particular framework in the 
quantum theory. All observables constructed in a par- 
ticular framework will commute with each other. We 
expect distinctly quantum features to appear only when 
comparing elements from different frameworks. 

Noncommutativity. — The unitary rotations U are gen- 
erally noncommutative and so introduce noncommuta- 
tivity into the quantum theory that is not present in 
the classical theory. Specifically, given A, B E S*, 
A' = UiA), and B' = V{B), then A'B' = U'^ AUV'^ BV 7^ 
B' A' , since U and V do not necessarily commute with 
each other or with A and B. Such noncommutativity is 
a manifestation of the fact that the Boolean algebras cor- 
responding to different frameworks are incompatible with 
each other; propositions from one framework cannot form 
a Boolean logical and with propositions from a different 
framework. We shall see in the next section, however, 
that the notion of disturbance followed by a logical and 
can be generalized to the noncommutative setting in the 
form of the projection postulate. 

Disturbance. — All nonconditioning disturbances T) in 
the quantum theory also take the form of unitary rota- 



tions U. Indeed, we shall see that the parallels between 
the quantum theory and the classical theory with distur- 
bance are quite strong when one interprets all unitary 
rotations as a form of classical disturbance. 

Time Evolution. — As an example, the continuous time- 
evolution of a closed quantum system is specified by a 
disturbance in the form of a unitary rotation lAt with 
corresponding rotor f/t, known as a propagator. For 
nonrelativistic quantum mechanics, the time-dependence 
of the rotor is specified by the Schrodinger equation: 
dtUt = {H/ih)Ut and a Hamiltonian observable H that 
generates the time translation. We are not concerned 
with the (well-established) details of continuous time- 
evolution in this paper, so we will treat any unitary ro- 
tations as given in what follows. 



1. Example: Polarization 

As an example quantum system we shall pick the sim- 
plest possible nontrivial system: a qubit. Specifically, we 
will consider the polarization degree of freedom of a laser 
beam. Suppose we are interested in measuring the linear 
polarization of the beam with respect to the surface of an 
optical table. We denote the polarization direction par- 
allel to the table as "horizontal" (h) and the direction 
perpendicular to the table as "vertical" (v). Although 
we casually refer to the polarizations h and v as if they 
were properties of the light beam, the propositions h and 
V operationally refer to two independent outcomes of a 
polarization distinguishing device, such as a polarizing 
beam splitter, that can be implemented in the labora- 
tory. 

The two orthogonal polarizations form a classical sam- 
ple space X — {h, v} and a classical Boolean algebra 
= {0,h,v,lx}, where Ix = h + v, similar to the 
classical sample space for the marble colors considered in 
Sec. HID II By extending the Boolean algebra over the 
reals to as before we can define classical observables 
Fx = a h+bv in this sample space, such as the Stokes ob- 
servable Sx — h — V that distinguishes the polarizations 
with a sign. 

We can represent the commutative observable algebra 
E* as diagonal 2x2 matrices, 

'^GS). --(o:)> (-) 

which can also be understood as commuting Hermitian 
operators over a two-dimensional Hilbert space. The 
atomic propositions h — \h){h\ and v — \v){v\ are pro- 
jectors that correspond to disjoint subspaces spanned by 
the orthonormal Jones' polarization basis for the Hilbert 
space, 

\h) = (J) , \v) - {^^ . (77) 
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To obtain the full quantum sample space Q{X) from 
X, we introduce the group of possible polarization ro- 
tations. Algebraically, an arbitrary rotation U{Fx) — 
U^FxU can be readily understood in terms of its rotor 
U, which is an element of the group SU(2) and can be 
parametrized, for example, in terms of the Cartan decom- 
position Ua,i3,-y = exp(iacrz/2) exp(i/3(Ty/2) exp(i7(T2/2), 
which for a qubit happens to correspond to an Euler an- 
gle decomposition of a three-dimensional rotation. Here 
iaz and —iuy are two of the three generators of the Lie 
algebra for SU(2) in terms of the standard Pauli matrices. 



-i 

1 



1 

-1 



(78) 



Since the group generators have a complex representa- 
tion, the unitary rotation Ua,p^~i will also have a complex 
representation in the Hilbert space. 
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-("-^)/2sin^ 



gj(a+7)/2 cos 



s'(«-'')/2 sin f 

-»("+7)/2 cos I 



(79a) 
(79b) 
(79c) 



The algebraic involution p -y '^^ complex transpose 
in the matrix representation. 

Physically, the factor exp(i/3fTj,/2) corresponds to a 
rotation of the apparatus around the axis of the light 
beam by an angle /3/2, while the factors exp{ia<Jz/2) and 
exp(z7(T2/2) correspond to the action of phase plates that 
shift the relative phases of h and v by a/2 and 7/2, re- 
spectively. Hence, the ubiquitous quantum phase also 
appears as a consequence of the unitary rotations. 

Using the unitary rotations, we can gener- 
ate other incompatible frameworks Ua^p^-y^X) — 

{Ua,f3,'y{h),Ua,l3,^iv)} in Q{X), 



l^a,p,-f{h) = hUa 



cos^f 



1 , 



l^a.,l3,-f{v) 
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vUa.p,^, 



sin/3 



(80a) 
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sin/3 
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which depend solely on the two parameters /3 and 7. The 
atomic propositions of such a rotated framework are pro- 
jectors corresponding to each disjoint subspace spanned 
by a rotated orthonormal Jones' polarization basis. 



U. 



a, ^,7 

t 



^a,/3,7l^'' = 



-»(a+7)/2 cos I 
-»(a-7)/2sin| 
g.(a-7)/2 sin £ 



g»(a+7)/2 cos I 



(81a) 
(81b) 



Physically, one could in principle construct an appa- 
ratus corresponding to such a rotated framework using 
three laboratory elements: (1) attach a tunable phase 
plate to the incident port of a polarizing beam splitter 
with the fast axis aligned to the table, (2) rotate both the 
beam splitter and attached phase plate with respect to 
the table, and (3) attach a second tunable phase plate 
to the incident port of the first phase plate with the 
fast axis aligned to the table. Of course this is only one 
possible parametrization for the unitary rotations; other 
parametrizations will correspond to other experimental 
implementations . 

It follows that any observable in the full quantum ob- 
servable space ^Q(x) '^^^ be obtained by rotating a clas- 
sical observable Fx = ah + bv to the appropriate frame- 
work, 

Fx' = Ua,P,-yiPx) = aUa^p^^{h) + bUa,p,-,{v), (82) 

2±^-H^cos/3 ^e^'^^siiiP 
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-e^"^ sin / 
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■ cos/ 



We see that a general qubit observable depends on four 
parameters: the eigenvalues a and b, as well as the frame- 
work orientation angles /3 and 7. The complex represen- 
tation of an observable stems solely from the unitary ro- 
tation of the atomic propositions h and u to a different 
relative framework. The observables no longer generally 
commute since the unitary rotations need not commute. 



B. States, densities, and collapse 

Quantum states. — A quantum state P is a classical 
state defined in a particular framework X that is then 
extended to apply to the entire quantum Boolean alge- 



bra S 



Qix)- 



The extension of a classical state P that 



has been defined in a framework AT to a proposition 
x' = U{x) G X' = U{X) in a different framework can be 
accomplished by heuristically breaking down the state 
into a composition of the classical state in framework 
X and transition probabilities Dx{x') that connect the 
framework X to the different framework X' 



Fix') = P{x)Dx{x'). 



(83) 



The transition probabilities characterize a disturbance 
([TT|) that connects the classical state P to propositions 
in incompatible frameworks. 

To define the transition probabilities, we assume that 
atomic propositions in the framework X are undisturbed, 
so Dx{x) = 1. The only classical state with this prop- 
erty is the pure state which has a trace-density ^ p — x. 
Hence, we assume that we can consistently write the tran- 
sition probability Dx{x') in terms of the extension of the 
trace to the full Boolean algebra Yiq^x), 



Dx{x') = Tr^xx'). 



(84) 
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Notably, this definition makes the transition between 
frameworks symmetric. 

Born rule. — We pick the trace extension to be 
the unique measure that satisfies the cychc property 
Tt{AB) = Tt{BA) for aU A,B e T,Qf^x) and agrees with 
the classical trace ([7]) within any specific framework [13] ■ 
On a Hilbert space, ((5^ has the familiar form, 

D,{x') = Tr(|a;)(a;||a;')(a;'|) = \{x\x')\^, (85) 

which we immediately recognize as the Born rule [83| . 
Hence, the complex square of the Hilbert space inner 
product can be seen as a disguised form of the nat- 
ural extension of the trace to define transition prob- 
abilities between propositions in incompatible frame- 
works. If we recah that x' = U{x) = WxU we can 
also write the transition probability (|85l) in terms of 
the unitary rotor that connects the two propositions, 
D,{x') = Tri\x){x\W\x){x\U) = \{x\U\x)\\ 

Density operator. — We can rewrite (j83p in a more fa- 
miliar form by using the Born rule ((84|) and the full trace- 
density ^ of the original state p — J2xex ^(^) '^hich 
is traditionally known as the density operator, 



P{x') = P{x)Tt:{xx') = Tr(pa;'). 



(86) 



This form of the probability functional conforms to Glea- 
sons theorem [84| . We note, however, that it is the ex- 
tension of the trace that extends the state to the non- 
commutative quantum setting since the trace-density p 
is identical to a classical trace-density in some particular 
framework X. 

Moments. — Since the probabilities P{x') are well- 
defined for a proposition in any framework x' € X' , we 
can linearly extend P to an expectation functional ^ • ) 
on the entire quantum observable algebra Eg^^^, 



{Fx')= E fx-{x')P{x')^T,{pFx'). 



(87) 



x'£X' 



Similarly, observable moments will be well-defined by the 
expectation functional. 



(88) 



x'ex' 



Hence, the unitary rotations and resulting extension of 
the trace completely construct the quantum probability 
space from a single classical probability space and its 
associated observables. 

Double-sided AND. — To be consistent with the assump- 
tions made in (j84p , we must also ensure that conditioning 
a quantum state on an atomic proposition will collapse 
the state to a pure state with a trace-density equal to that 
atomic proposition. In other words, we must generalize 
the logical and of the classical case to the noncommuta- 
tive incompatible frameworks in the quantum case. The 
consistent way to do this is through a double-sided prod- 
uct: given atomic propositions x £ X and x' G X' then 
x'xx' = \x'){x'\x){x\x'){x'\ = Tt{xx')x' = D^{x')x'. 



The double-sided product with x' produces a transition 
probability Dx{x') from x to x' as a proportionality factor 
in addition to collapsing the original proposition x to 
x' . In this sense, the double-sided product includes a 
form of disturbance in addition to the logical and of pure 
conditioning, li X = X' , so the frameworks coincide, 
then X and x' will commute; the disturbance will vanish, 
reducing the transition probability Dx{x') to either or 
1; and, the classical and will be recovered as a special 
case. 

Liiders ' rule. — Using the double-sided product as a dis- 
turbance followed by a logical and, we find the quantum 
form of the invasive conditioning rule (|13p . 



\Fx)y = -^rZV- = Tr(pj,Fx), 



Py 



P{y) 
ypy 

Tr(py) ' 



(89a) 
(89b) 



for any Boolean proposition ?; in a framework algebra Y^x 
measured prior to the observable Fx. As with the clas- 
sical case, we use the tilde to indicate the intrinsic quan- 
tum invasiveness of the measurement process. If p and y 
commute, or if Fx and y commute, then the noninvasive 
classical conditioning rule (|5]) is properly recovered. This 
generalization of p3p is known as the projection postu- 
late, or Liiders' Rule 85]. If y is an atomic proposition 
in X, then py = y as in the classical case ([8]) and we 
consistently recover the assumption . 

For contrast, Leifer and Spekkens [6^ provide a care- 
ful quantum generalization of the noninvasive condition- 
ing rule ([8]) using a formalism based around conditional 
density operators. They confirm that Liider's Rule ([M)) 
cannot be obtained with pure conditioning, so it must 
imply additional disturbance from the measurement pro- 
cess itself, as indicated here. 

Aharonov-Bergmann-Lebowitz rule. — Just as with clas- 
sical invasive conditioning, the order of conditioning will 
generally matter. Specifically, substituting a system 
proposition z e T,x hito ([55]) yields (z)^ = P{yzy)/ P{y); 
however, P{yzy) ^ P{zyz), so the "joint probability" in 
the numerator is order-dependent unless y and z com- 
mute, just as in That is, (z)^ exphcitly describes 
the case when the conditioning proposition y is measured 
first as a preselection, followed by the proposition z. 

To obtain the converse case when the conditioning 
proposition z is measured second as a postselection, we 
must derive the quantum form of (j27p . As in the clas- 
sical case, we reinterpret the denominator of ([55)1 as a 
marginalization P(jj) = Piv^v) of the ordered joint 
probability that renormalizes the conditioning procedure; 
the identity J2z ^ — ^x permits the equality. With this 
interpretation, the postselected form of conditioning be- 
comes straightforward. 



P{yzy) 



Ey'^yPiy'^y') 



(90) 
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As in the classical case, the different position of the sub- 
script serves to distinguish the two conditioned expecta- 
tions (~)^ and ^(~) corresponding to different measure- 
ment orderings. 

For a pure state p = x = \x){x\, this postse- 
lected conditioning is known as the Aharonov-Bergmann- 
Lebowitz (ABL) rule SB'], and has the form (y) = 

My)?Mx)?li:y.<,YMy')?W\^)?- Unlike^iiders' 
rule (|89l) . the generalized ABL rule (IMl) does not per- 
form a simple update to the trace-density p; moreover, 
it depends on the entire disturbance of the first mea- 
surement via the normalization sum in the denominator. 
If y and z commute, then the disturbance vanishes and 
we again correctly recover the classical case ([5]) that is 
order- independent . 

Bayes ' rule. — The two forms of quantum invasive con- 
ditioning also lead to a modified form of Bayes' rule that 
relates the preselected conditioning of a sequence to the 
postselected conditioning of the same sequence, similarly 
to the classical case ([29]), 

If y and z commute, then the disturbance vanishes and 
we correctly recover Bayes' rule pUj) . 

The unusual form of ([TO)) has led to postselected quan- 
tum conditioning being largely overlooked. The lack of 
symmetry in the density update under such postselected 
conditioning has even prompted works in multistate- 
density tim e-sy mmetric reformulations of quantum me- 
chanics |17l[l9l - [2lll24l[25l . [87| . which are outside the scope 
of this work. However, we see here that the form of the 
conditioning is the same as the classically invasive post- 
selected conditioning (P7|) . Later we shall use a fully 
generalized form of the ABL rule (^0)) together with CVs 
to consider the subtle case of postselected averages of ob- 
servables in some detail, so we delay their consideration 
for now. 



1. Example: Polarization state 

A quantum state for a single system is a classical state 
in some particular framework. For a two-dimensional 
framework such as {h, w}, all probabilities for such a clas- 
sical state can be completely specified by a mixing an- 
gle e such that P{h) = cos\e/2) and P{v) = sin^{e/2). 
Hence, after rotating the trace-density p = P{h)h+P{v)v 
to an arbitrary framework according to ((5^ . any quan- 
tum state trace-density of polarization must have the 
form, 

Pe,/3,7 = cos^{9/2)Uc,^p,^{h) + sm^{9/2)Uc,,p,^{v), (92) 

_ 1 /l + cos /3 cos 6* e"*"*" sin/3 cos6'\ 
2 I e^'* sin /3 cos 6 1 — cos (3 cos 9 J 

The a parameter of the rotation disappears in favor of 
the 9 parameter characterizing the classical state, leav- 



ing only three net parameters, in contrast to the four 
parameters of an arbitrary observable (|82p . 

The expectation functional ^ ^ is then defined from 
the trace-density pe,/3.-y and the unique extension of the 
trace Tr to the whole observable algebra 5^q(x) according 

to {Fx') g p ^ — T^r{p0^j3^^ Fx')- The trace extension is 
the sum of the diagonal matrix elements in the matrix 
representation. Hence for the expectation of an arbitrary 
observable ([5^ under an arbitrary state ([M)) we find, 

{Uo.'^P',r{Fx)),^p^^ = ^ + ^(cos0)S, (93a) 
S — cos /3 cos /3' + sin/3sin/3' cos(7 — 7'), (93b) 

where S G [—1,1] is an interference factor that depends 
only on relative orientation between the state framework 
and the observable framework. If the frameworks coin- 
cide, then S = 1 and the classical result is recovered. 



C. Detectors and probability observables 

Joint observable space. — As with the classical case, we 
can couple a system to a detector by enlarging the sam- 
ple space to the product space XY of a particular pair 
of frameworks. We can then perform local unitary rota- 
tions on each space independently to form a joint quan- 
tum sample space from the classical joint observables 
Q{X)Q{Y). However, the quantum observable space also 
admits global unitary rotations on the classical joint ob- 
servables to form a larger joint quantum sample space 
Q{XY). Just as with a single sample space, any two 
propositions in Q{XY) can be continuously connected 
with some global unitary rotation. 

The full quantum observable space Sg^^y^ is con- 
structed from Q{XY) in the usual way. Product ob- 
servables will maintain their product form under local 
unitary rotations, UxiyviAxBY)) = Ux{Ax)Vy{By)- 
However, global unitary rotations can create unfactorable 
correlated joint observables in Egj-j^^^ even from product 

observables U^AxBy). 

Joint states. — Similarly, joint states on a classical 
product framework extend to joint quantum states on 
the quantum product observable space. Under local uni- 
tary rotations, product states remain product states and 
classically correlated states between two specific frame- 
works remain classically correlated. However, global uni- 
tary rotations performed on any state can also form en- 
tangled states that have no analog in the classical the- 
ory [s^. Entangled states have some degree of local- 
rotation-independent correlation between frameworks, so 
display a stronger degree of correlation than can even 
be defined with a classically correlated state that is re- 
stricted to a single pair of frameworks. As an extreme ex- 
ample, maximally entangled states are completely local- 
rotation-independent and perfectly correlated with re- 
spect to any pair of frameworks. 
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Quantum operations. — The specifics of entanglement 
do not concern us here, since any type of correlation is 
sufficient to represent detector probabilities within the 
reduced system space. For the purposes of measurement, 
we only assume that the correlated state with density 
p = U'^ {px Py) = UpxpyU^ is connected to some ini- 
tial product state with density pxRY via a unitary ro- 
tation . Since all quantum states can be continuously 
connected with some global unitary rotation that acts 
as a disturbance (PT|) . this is always possible. Physi- 
cally, the unitary rotation couples the known detector 
state py to an unknown system state px- Further- 
more, we assume that the initial state of the detector has 
some (not necessarily unique) pure-state expansion that 
is meaningful with respect to the preparation procedure 

py = EyeY' P'{y')y'- 

It then follows that the numerator for the conditioning 
rules (l89l) and (l90t becomes, 



and the general invasive measurement (|24 



{yFxv) = Tr{pvFxv). (94) 
= Trx(Try([/px/OyC^'''y-Fxy)), 
^{£y{Fx))^^Trx{£l{px)Fx), 

with the operations £y and £^ defined as, 

£y{Fx) = {U^yFxyU)^, (95a) 

= P'{y')TrY{y'U^yFxyU), 
v'eY' 

= E Ml,FxMyy, 
yeY' 

Slipx) = Try (yUpxpyU^y), (95b) 
= P'{y')TrY{yUpxy'U^y), 

y'GY' 

My,y' = e'^y^y' VP^){y\U\y'), (95c) 
Mly, = e"^^-«'x/PV)(2/'|C/t|y). (95d) 

Here, the Hilbert space representations of the Kraus op- 
erators {My^yi} have the form of partial matrix elements 
and are only well-defined up to the arbitrary phase fac- 
tors e^'^y-y' . We also stress that {My^yt} depend not only 
on the measured detector outcome y, but also on a par- 
ticular detector preparation y' . 

As a result, we find the quantum versions of the prob- 
ability observables 



(96) 
(97) 



P(y) = {£y{lx))^ - (Ey)^, 
Ey^£y{lx)^{U^yU)y, 

= E Ky'My,y', 



{Fx)y 



'MFx)) 



X 



{^yi^x))^' 

Ey'eY'^MpxMl^,FxMyy) 
^rx(pxEy) 



(98) 



Similarly to the invasive classical case ([25|) . the mea- 
surement of y on the detector must be described by a 
quantum operation £y in ()94p , which is a completely pos- 
itive map 0, 043 llQ . [tS . 89] that performs a generalized 
measurement on the system state corresponding to the 
detector outcome y. The operation £y acting on the iden- 
tity in (j97|) produces a positive operator known as a quan- 
tum effect, Ey. By construction, the set of operations 
{Ey} preserves the identity, '^y^yi^x) — ^x] hence, the 
effects form a partition of the identity, Ey = 1 , 
making them probability observables over a particular 
detector framework exactly as in (f^ . 

Sequences of measurements emphasize the temporal 
ordering of operations, just as in the invasive classical 
case (|26p. Given two sets of quantum operations that 
define the sequential interaction of two detectors with 
the system and their subsequent conditioning, {Ey} and 
{S'z}: the joint probability of the ordered sequence of de- 
tector outcomes (y, z) is, 

P{y)P{z\y) - P{yzy) = P(yz\xzy), (99) 
- {Ey{E'Alx))) ^ = (SyiE'J)^, 

where E'^ = f^(lx). The proper sequential probability 
observable £y{E'^) = J2y' -^l y'E'zMyy is not a simple 
product of the individual probability observables Ey and 

These sequence probabilities then give us the full gen- 
eralization of the ABL rule (PU)) . 



{£viEi)) 



X 



{£y{E'.)) 



X 



(100) 



{SiK))x Ey",Y{£y"iEi))x' 
Ey'eY'T^^xiPxMly,E',Myy) 

EyeY EyeY' Trx{pxMl„ ^.E'^My. ^y,)' 

and the most general version of the invasive quantum 
Bayes' rule dH]), 



Ay) \'^-/y(^s(^Ei))^-' 



(101) 



y'eY' 



As with ((27)) and (|90)) . the postselected conditioning 
poop depends on the entire disturbance of the first 
measurement via the nonselective measurement £ = 
12y"eY ^y" the denominator. 

The noncommutativity of the detection operations £y 
emphasizes the fact that measurement is an active pro- 
cess: an experimenter alters the quantum state by cou- 
pling it to a detector and then conditioning on acquired 
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information from the detector. Without some fihering 
process that completes the disturbance imphed by 
there is no measurement. The nonselective measurement 
E also includes the active disturbance of the measurement 
process, but does not condition on a particular outcome. 
Furthermore, measuring a quantum state in a different 
order generally disturbs it differently. The state may also 
in certain conditions be probabilistically "uncollapsed" 
back to where it started by using the correct condition- 
ing sequence In this sense, sequential quantum 
conditioning is analogous to a stochastic control process 
that guides the progressive disturbance of a state along 
some trajectory in the state space [l6j . 

Measurement operators. — Since the quantum opera- 
tion £y performs a measurement, we will refer to its Kraus 
operators {My (|95p as measurement operators. How- 
ever, a quantum operation generally has many equiva- 
lent double-sided product expansions like (j95ap in terms 
of measurement operators. Each such set of measure- 
ment operators {My^yi} corresponds to a specific choice 
of framework for the preparation of the detector state 

Given a specific set of measurement operators, the 
substitution My^yi — > UyyMy^ with unitary Uy^y' will 
produce the same effect Ey according to but will 
correspond to a different operation £'y. Hence, we con- 
clude that many measurement operations can produce 
the same probability observables on the system space 
[90| . Therefore, probability observables are not sufficient 
to completely specify a quantum measurement: one needs 
to specify the full operations as in the classically invasive 
case p5l). 

Quantum process tomography. — Just as classical prob- 
ability observables can be characterized via process to- 
mography, operations can be characterized by quantum 
process tomography. One performs quantum process to- 
mography by sending known states into a detector, mea- 
suring the detector, then measuring the resulting states 
to see how the state was changed by the detector. Since 
quantum operations contain information about distur- 
bance as well as conditioning, quantum process tomog- 
raphy generally requires more characterization measure- 
ments than pure classical process tomography. 

Pure operations. — An initially pure detector state with 
density y' produces a pure operation £y{Fx) = M^FxMy 
with a single associated measurement operator My = 
e^'^y {y\U\y') that is unique up to the arbitrary phase fac- 
tor e**^" . Most laboratory preparation procedures for the 
detector are designed to produce a pure initial state, so 
pure operations will be the typical case. A pure opera- 
tion has the additional property of partially collapsing a 
pure state to another pure state. It is also most directly 
related to the probability observable Ey — M^My, since 
the single measurement operator has a polar decomposi- 

1 /2 

tion My = UyEy' in terms of the positive root of the 

1/2 

probability observable Ey' . 

Weak measurement. — If we wish for such a condition- 
ing process to leave the state approximately unchanged. 



we must make a weak measurement, just as in the classi- 
cal case (jl9p . However, a quantum weak measurement 
requires a strict condition regarding the measurement 
operations and not just the probability observables due 
to the additional disturbance in the measurement. For- 
mally, the measurement operations typically depend on 
a measurement strength parameter e such that, 

Vy e y lim £y{e- Fx) = Py (y)I(Fx), (102) 

e— »0 

where T is the identity operation and Pyiy) is the proba- 
bility for obtaining the detector outcome y in the absence 
of interaction. As with the classical case, the limit as 
e — > is an idealization known as the weak measurement 
limit and is not strictly achievable in the laboratory. 

The definition (|102p implies that subsequent measure- 
ments will be unaffected, Vy £ Y, lime_yo {Fx)y — (Px), 
and that the probability observables are proportional to 
the identity in the weak limit, Vy G Y, liin^^Q Ey{e) = 
-Py(y)lx, just as in the classical case p9)) . It also follows 
that any set of measurement operators {My^yi(e)} that 
characterize £y{e) must also be proportional to the iden- 
tity in the weak limit Vy G Y,y' E Y' , lim^^o My yi{e) cx 
Ix- 

Weak measurements are more interesting in the quan- 
tum case than in the classical case due to the existence 
of incompatible frameworks. Since a weak measurement 
of an observable does not appreciably affect the quantum 
state, subsequent measurements on incompatible observ- 
ables can be made that will probe approximately the 
same state. This technique allows (noisy) information 
about two incompatible frameworks to be gleaned from 
nearly the same quantum state in a single experiment, 
which is strictly impossible using strong measurements 
that collapse the state to a pure state in a particular 
framework after each measurement. The penalty for us- 
ing weak measurements is that many more measurements 
are needed than in the strong measurement case to over- 
come the ambiguity of the measurement, as discussed in 
the classical case. 



1. Example: Coverslip polarization detector 

To cement these ideas, we consider the task of indi- 
rectly measuring polarization in a particular framework. 
For specificity, we will consider the passage of a laser 
beam with unknown polarization through a glass micro- 
scope coverslip, as shown in Fig. [21 Fresnel reflection off 
the coverslip leads to a disparity between transmission 
and reflection of the polarizations, so comparing trans- 
mitted to reflected light allows a generalized measure- 
ment^of polarization, as we demonstrated experimentally 
in l5l|. 

The system sample space we wish to measure is the 
polarization with respect to the table {h = \h){h\) and 
(v — \v){v\), which could in principle be measured ide- 
ally with a polarizing beam splitter. The detector sample 
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FIG. 2. Coverslip polarization measurement. A laserbeam passes through a preselection x-polarizer, a glass microscope 
coverslip, and a postselection z-polarizer. The transmission probabilities for each segment of the apparatus are shown. By 
assigning appropriate contextual values and /y(r) (|121[) to the output ports of the coverslip, the polarization observable 

Fx = fx(h)h + fx{v)v can be measured using the equivalent expansion in terms of the appropriate measurement context 
Fx = fY{t)£t{lx) + /y(r)£r(lx). Averaging the same contextual values with pre- and postselected conditional probabilities 
^(t)^ = {x£tiz)x) ^/{(^x£t{z)x) ^ + (^x£r{z)x) ^) and ^{r)^ ~ (x£r(z)x) -^/{(^x£t{z)x) ^ + (x£r(z)x) produces the conditioned 

Iverlge G23 ,(i^>, = /i-W.^, + /y(r)^<?)^. 



space is the spatial degree of freedom of the transmitted 
(t = \t){t\) and reflected (r = |r)(r|) ports of a cov- 
erslip rotated to some fixed angle with respect to the 
incident beam around an axis perpendicular to the ta- 
ble. The initial state of the detector is the pure state 
indicating that the beam enters a single incident port 
(6 = of the coverslip with certainty. The rota- 

tion U'^{pxh) — UpxbU^ that couples the system to the 
detector describes the interaction of the beam with the 
coverslip and has a unitary rotor U corresponding to the 
polarization-dependent scattering matrix of the coverslip. 
Assuming that the scattering preserves beams of pure 
polarization, so h remains h and v remains v, the rotor 
decouples into a direct sum of rotors that are specific to 
each polarization, 

U = Uh(SU^, (103) 

meaning that U has a block-diagonal structure when rep- 
resented as a matrix. 

Selecting each output port of the coverslip produces 
the two measurement operators according to (|95p . 

M, = mil.) = {^'^T^ , (104.) 

which characterize the pure measurement operations that 
modify observables according to (|95ap . 

StiFx) = m}FxMu (105a) 
£r{Fx) = MlFxMr, (105b) 

and their adjoints that modify the state density according 
to (|95b| . 

£l{px)^MtpxM}, (106a) 
£l{px) ^ MrPxMl (106b) 



The pure measurement operations in turn produce 
probability observables according to (|97p . 

Et^£t{lx)=MlMt, (107a) 

_ (\{t\UH\b)? \ 

-1, |(i|c/.|fe)lV' 

Er = Er{lx) = MlMr, (107b) 
_ /|(r|C/,|6)|2 \ 
-1, \{r\U,\b)?)^ 

in the same framework as h and v. These probability 
observables are therefore equivalent to classical proba- 
bility observables ([23]) specified by the effective charac- 
terization probabilities P{t\h) = \{t\Uh\b)\^, P{r\h) = 
\{r\UH\b)?, P(t\v) = \{t\U,\b)\\ ^nd P{r\v) = \{r\U,\b)\\ 

The measurement operators (|104p have a polar decom- 
position in terms of the roots of the probability observ- 
ables and an extra unitary phase contribution. 




(108a) 
(108b) 



Any nonzero relative phase, such as <f>h,t — (f'v.t, will af- 
fect the framework orientation for subsequent measure- 
ments; however, it will not contribute to the acquisition 
of information from the measurement since it does not 
contribute to the probability observables. Such relative 
phase is therefore part of the disturbance of the measure- 
ment process. 

Specifically, the initial state of polarization Px will be 
conditioned by a selection of a particular port on the 
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detector according to, 



(109a) 



(ft), . - ^^^^^S^^:^. (109b) 



(&(lx)> 



X 



Trx (pxEr) 



Although the probabihties in each denominator only de- 
pend on the probability observables, the altered states in 
each numerator depend on the measurement operations 
and will include effects from the relative phase in the 
measurement operators (|108p . 



D. Contextual values 

Operation correspondence. — The introduction of con- 
textual values in the quantum case proceeds identically 
to the classical case of invasive measurements (|34p . Since 
we must generally represent detector probabilities by op- 
erations {£y} within the reduced system space according 
to ([97)1 and ([M|) , we must also generally represent detec- 
tor observables by weighted operations within the reduced 
system space, 



(110) 



= Y,fY{v){£y{lx))^ = {:Fx{lx))^, 

yeY 

•^jf = E fy^v)^v (111) 

If we are concerned with only a single measurement, or 
are working within a single framework as in the classical 
formalism, then for all practical purposes the operation 
Tx reduces to its associated system observable Fx = 
^x{^x) as in the classical definition (l3Tt . 

Contextual values. — We observe a corollary exactly as 
in the classical case ^62\ : if we can expand a system 
observable in terms of the probability observables gener- 
ated by a particular measurement operation, then that 
observable can also be expressed as an equivalent detector 
observable, 



F- 



X 



yeY 



{v)Ey 



FY = Y.fy(y)y' (112) 



which is the quantum form of our main result originally 
introduced in j49|]. As in the classical case, we dub the 
required detector labels /^(y) the contextual values 
(CV) of the quantum observable Fx with respect to the 
context of a specific detection scheme as represented in 
the system space by the measurement operations {£y}. 
Since many measurement operations produce the same 
probability observables {£y{lx) = Ey}, many detection 
schemes can use the same CVs to reproduce an observable 
average. 



Moments. — As with classically invasive measurements 
([55]) , higher statistical moments of the observable require 
more care to measure. For instance, we require the fol- 
lowing equality in order to accurately reproduce the n^^ 
moment of an observable indirectly using the same CV, 

{(PxT)^^ J2 fYiyi)---fYiyn){Ey,---Ey„)^. 

yi,---,yn£Y 

(113) 

However, as indicated in (j99p . performing a sequence 
of n measurements produces the measurable probabil- 
ity {£y,i---iEyJ---))^ ^ {Ey,---Ey^)^. ludced, 
(Ey^ ■ ■ ■ Ey^) ^ will not generally be a well-formed prob- 
ability. To obtain the equality (jll3p with a particular 
choice of CV, we need the additional constraint that 
all the measurement operators must commute with each 
other. As a result, they must be part of the same frame- 
work as the system observable and hence commute with 
that observable as well. We will call any detector with 
commuting measurement operators with respect to a par- 
ticular observable a fully compatible detector for that ob- 
servable. Evidently, this is a strict requirement for a 
detector. 

Alternatively, as with the classical case, we can change 
the CVs to define new observables that correspond to 
powers of the original observable, such as Gx = {Fx)" = 
X^ygy 9Y{y)Ey. These new observables can then be mea- 
sured indirectly using the same experimental setup with- 
out the need for measurement sequences. The CVs gY{y) 
for the n**^ power of Fx will not be a simple power of the 
CVs fY{y) for Fx unless the measurement is unambigu- 
ous. 

Correlation functions. — If a time-evolution unitary ro- 
tation Ut is inserted between different observable mea- 
surements, then we obtain a quantum correlation func- 
tion instead. 



{Fx{Q)Gx{t)) = {:Fx{Ut{gx{lx)))) 



(114) 



which should be compared to the classical case (|36p . 
Similarly, n-time correlations can be defined with n — 1 
time-evolutions between the observable measurements 

(j-l(Wt,(J-2(---Wt„_,(J-„(lx)) •••))))• 

Inversion. — Since the CVs depend only on the prob- 
ability observables, which commute with the measured 
observable for a fully compatible detector, the procedure 
for determining the CVs will be identical to the classi- 
cal case. That is, the contextual values of a quantum 
observable exactly correspond to the detector labels for a 
classically ambiguous detector. We shall refer the reader 
back to the classical inversion (|40l) for discussion on how 
to solve the relation (I112p . As a reminder, we advocate 
the pseudoinverse as a principled approach for picking 
the CVs in the event of redundancy or course-graining. 

Conditioned averages. — We can construct a general 
postselected conditioned average from the CVs and the 
fully generalized ABL rule (|100p analogously to the clas- 
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sical case (l37l) 



X 



(115) 



_ EysvEy'eY' fY{y)MpxMl^,E',My,y,) 
EyeY EyeY' Tr{pxMl^,E',My^y,) 

We introduced this type of conditioned average in [i^ 
for the typical case of pure operations {£y} with single 
associated measurement operators {My}. 

If the postselection is defined in the same framework as 
the measurement operation, then the nonselective mea- 
surement £ in the denominator will reduce to unity, leav- 
ing a classical conditioned average. 



{Px)^_ = 



Ey^YfY{y){EyE',)^ 



X 



X 



(116) 



X 



of the same form as p8|) . Similarly, the preselected con- 
ditioning (|98l) will also reduce to (|116p for such a case. 
This special case cannot exceed the eigenvalue range of 
the observable: the observable Fx will always reduce to 
its eigenvalues since either the state or the postselection 
commute with it. 

More generally, however, the combination of amplified 
CVs and the context-dependent probabilities in the gen- 
eral postselected average (IllSp can send it outside the 
eigenvalue range of the observable. As we discussed in 
[HH loil l , having such a conditioned average stray outside 
the eigenvalue range of the observable is equivalent to a 
violation of a Leggett-Garg inequality that tests the as- 
sumptions of macrorealism under noninvasive detection. 
As a result, an eigenvalue range violation gives a direct 
indication of either nonclassicality present in a measure- 
ment sequence, or intrinsic measurement disturbance be- 
yond that of noninvasive classical conditioning as we saw 
in the example in ijllD 21 We refer the reader to [5l|, |9l| 
for more detail on this matter. 

Strong- conditioned average. — There are two other im- 
portant special cases of the conditioned average (|115p 
worth mentioning: strong measurement and weak mea- 
surement. The strong measurement case is distinguished 
by being constrained exclusively to the eigenvalue range 
of the observable. Specifically, (|115l) reduces to the form, 



.(Fx) = 



E.ex fxix)P{x)DAz) 

E:.ex fxix){x\p\x)\{x\z)\' 
E.ex{^\p\x)\{x\zW ' 



(117) 



which contains only the eigenvalues fx{x) of the observ- 
able and factored probability products. However, it can- 
not be expressed solely in terms of the observable Fx 
and a conditioned state as in the classical case (1571) due 
to the disturbances Dx{z). Only when the state or posts- 
election commutes with the observable does (jll7l) reduce 



to a special case of (I116P and become free from distur- 
bance. 

Weak values. — The weak measurement case is distin- 
guished by being the only case of the quantum postse- 
lected conditioned average (jllSp that can become context 
independent for any state and postselection (under cer- 
tain conditions). The context-independent weak limit of 
the conditioned average (IllSp is the weak value [13, 



.{Fxr^ 



{E',Fx+FxE',) 



X 



(118) 



X 



and is expressed entirely in terms of the system expec- 
tation functional {■)-^(, the postselection probability ob- 
servable E'^ , and the observable Fx ■ Written in this form 
it is clear that it is a symmetrized version of the context- 
independent commuting case (jll6l) : however, unlike (jll6p 
the weak value (IllSp is not constrained to the eigenvalue 
range and can even diverge. For a pure initial state with 
trace-density x and pure postselection z, the weak value 
piSp takes the traditional form. 



.(Ex): 



Re- 



{z\Fx\x) 
{z\x) 



(119) 



We will consider under what conditions one can obtain 
such a weak value in Sec. IIIIEI 



1. Example: Coverslip detector revisited 



Continuing the example from Sec. IIII C II and Fig. [21 
observables defined in the same framework as the proba- 
bility observables may be expressed in terms of the prob- 
ability observables according to ()112p using contextual 
values (CVs), exactly as in the classical example (|T7)) . 



Fx^ fx{h)h + fxiv)v, 

^ fY{t)Et + fY{r)Er, 

fx(h)\ ^ (P{t\h) P{r\h)\ (frit) 
fxiv)J \P{t\v) Pir\v)J {fY{r) 



Inverting this relation according to 
unique CVs, 



Pir\v)fxih) 

.lY[t) = --■ 



Pir\h)fxiv) 



fY{r) 



P{t\h)P{r\v) - P{r\h)P{t\v) 

P{t\v)fx{h)-P{t\h)fx{v) 
P{t\h)P{r\v) - P{r\h)P{t\v) ' 



(120a) 

(120b) 
30)) produces the 

(121a) 
(121b) 



The denominator is unity when the output ports of the 
coverslip are perfectly correlated with the polarization. 
Otherwise, the denominator is less than one and serves to 
amplify the CVs to compensate for the ambiguity of the 
detection. The numerator contains cross-compensation 
factors that correct bias in the detector; that is, the 
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eigenvalue fx{h) for h in the contextual value for 
t is weighted by the conditional probability P{r\v) cor- 
responding to the complementary quantities of v and r, 
and so forth. 

The CVs define the detector observable that is actually 
being measured in the laboratory, 



Fy 



fY{t)t + fY{r)r. 



(122) 



This detector observable corresponds to a detection op- 
eration on the system space according to (|llip . 



^X^fYit)£t + fYir)£r, 



(123) 



which fully describes the interaction with the detec- 
tor, subsequent conditioning, and experimental conven- 
tion for defining the observable. When no subsequent 
conditioning is performed on the system, this opera- 
tion constructs the system observable Fx = J-xi^x) = 
fY{t)Et + fY{r)Er, as desired. 

Since the pure measurement operations all belong to 
the same framework and commute with Fx, the opera- 
tion Tx is also fully compatible with the observable Fx , 
meaning it can measure any moment of that observable 
using the same CVs according to (|113p . 

(j-5(lx))^ = ((Fx)")^, (124) 

= ^ fYiii)...fYiin){En...E,J^. 

The quantity J-^{lx) indicates a sequence of n consec- 
utive measurements made by the same coverslip on the 
beam to construct the observable (Fx)^ for the n*^ mo- 
ment of Fx- That is, the output from each port of 
the coverslip is fed back into the coverslip to be mea- 
sured again. There are 2" possible outcome sequences 
(ii, . . . , i„) for n traversals through the coverslip, each 
with probability {Ei^ . . . Ei^'^ ^ of occurring. These prob- 
abilities are weighted with appropriate products of cor- 
responding CVs and summed to correctly construct the 
n'^ moment of Fx- 

Alternatively, one can change the CVs to directly mea- 
sure the observable Gx = {Fx)"' = gY{t)Et + gY{r)Er 
from one traversal of the coverslip. The required CVs for 
Gx, 

P{r\v){fx{h)r - P{r\h){fx{v)r .... . 



P{t\h)P{r\v) - P{r\h)P{t\v) ' 
P{t\v){fx{h)r ~ Pm{fx{v)Y 



(125b) 



P{t\h)P{r\v) - P{r\h)P{t\v) 

are not simple powers of the CVs (|12ip for Fx unless the 
measurement is unambiguous. 

In addition to moments of Fx , we can obtain postse- 
lected conditioned averages of Fx by conditioning on a 
second measurement outcome characterized by a proba- 
bility observable after the measurement by the cover- 
slip according to (|115p . 

{:fx{K)), 



.(Fx) 



' X 



X 



(126) 



where £ = St+Sr '^^ the nonselective measurement by the 
coverslip. The second measurement could be a polarizer, 
another coverslip, or any other method for measuring 
polarization a second time. 

If the initial state is pure with a density p = x ~ |a;) (a::| 
and the final postselection is also pure z = \z){z\, then 
(I126P simplifies to a pre- and postselected conditioned 
average, 

(p. _ fYmz\M,\x)\' + fY{r)\{z\Mr\x)\' 

\{z\Mt\x)\^ + \{z\Mr\x)\^ • ^ ' 

If we relate both pure states to the reference state h via 
unitary rotations as defined in (j79l) . x — lAa.p^'y{h) and 
z = hicy.' ,p' .f' {h) , then the probabilities take the form, 

\{z\Mt\x)\^ = P'*(t)cos2(/3/2)cos2(/?72) (128a) 
+ P''(t) sin2(;3/2)sin2(/372) 



P^{t)P^{t) 



sin /? sin (3' x 



cos(7 - 7' - (l)h,t + (l>v.t), 
|(z|M^|2;)|2 =p''(r)cos2(/3/2)cos2(/372) (128b) 
+ #"(0 sin2(/3/2)sin2(/372) 



ph{r)P'"{j 



■ sin j3 sin /?' x 



cos(7 - 7' - 0^ -I- ,,). 

We see that each probability possesses an interference 
term that stems from the relative orientations of the 
incompatible frameworks for the preparation, measure- 
ment, and postselection. In addition, the relative phases 
in the measurement operators (I108P will affect the orien- 
tations of the frameworks and further disturb the mea- 
surement, as mentioned. For the classical case, the frame- 
works coincide, so f3,(3' € {0,7r}; the interference term 
vanishes; and, the probabilities reduce to the conditional 
probabilities that characterize the probability observ- 
ables. 

The combination of the expanded range of the CVs 
(I12ip and the interference term in the probabilities (|128p 
can make the postselected conditioned averages (|126p 
counter-intuitively exceed the eigenvalue range of the ob- 
servable Fx- Such a violation of the eigenvalue range 
cannot occur from classical conditioning without distur- 
bance as in Sec. HID 21 



2. Example: Calcite polarization detector 

We can also measure polarization using a von Neu- 
mann measurement that uses a detector with a con- 
tinuous sample space detector, such as position. For ex- 
ample, passing a beam of polarized light through a calcite 
crystal will continuously separate the polarizations h and 
V along a particular position axis. Measuring the position 
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/r(y), e = 1 friy), e = 0.1 /y(y), e = 0.02 




FIG. 3. Preferred CVs fvijj) given in H132cp for a calcite position measurement that targets the polarization observable 
Fx = h — V, shown for strong separation (e — 1), wimpy separation (e = 0.1), and weak separation (e = 0.02) of the 
polarizations. Top Row: Initial Gaussian beam profile. Middle Row: Initial Laplace beam profile. Bottom Row: Initial 
top-hat beam profile. Note that the top-hat CVs are the eigenvalues of ±1 under strong separation, but become amplified as 
the distributions start to overlap; moreover, the top-hat CVs cancel out in the perfectly ambiguous overlapping region. The 
amplification and cancellation behavior of the CVs is more complicated for less definite detector profiles. 



profile of the resulting split beam along that axis allows 
information to be gained about the polarization. 

For such a setup, measuring the position with a lin- 
ear scale corresponds to measuring a detector observ- 
able Q = JYyd\y){y\ for a continuous sample space 
of distinguishable positions. The observable Q has a 
conjugate Dq that satisfies [Q,Dq] = ily- The con- 
jugate can thus generate translations in Q with a uni- 
tary rotor, exp{iqDQ)Q exp{-iqDQ) = Q + [iqDq^Q] + 
[igBq, [iqDq, Q]] + ■ ■ ■ — Q + qly ■ Hence, we can model 
the calcite crystal as a rotation governed by a unitary 
rotor of the form 

U = ex^{-i{ehh-e^v)DQ), (129) 

which will translate h polarization by some amount eh 
while simultaneously translating v polarization by some 
amount in the opposing direction. The parameters eh 
and e„ will depend on the geometry of the crystal with 
respect to the incident beam. 

Suppose the light beam has an initially pure beam pro- 
file state described by a density p = The proba- 
bility for obtaining a particular pure position y = \y){y\ 



in the profile would then be dPyii/) = pY{y)dy = 
Tr{py)dy = \ {y\ip)\'^dy. Each complex factor {y\ip) is the 
"wave function" of the transverse beam profile, whose 
complex square is the probability density with respect to 
the integral Priy) ^ \{y\ip)\'^ ■ 

If we then pass the beam through the crystal described 
by the rotor (|129p and measure its position in a pure po- 
sition state y = \y){y\, we will have enacted a pure opera- 
tion on the polarization of the beam that is characterized 
by a single measurement operator, 

dSyiFx) = M{y)^FxM{y)dy, (130a) 
M{y) = {y\U\^), (130b) 
= h{v - e/ilV') + v{y + fi.lV'), 

with components equal to the initial wave function of the 
detector profile shifted in position by an appropriate e. 
The pure measurement operations define a continuous set 
of probability observables, 

dE{y) = d£y{lx) = M{y)^M{y)dy, (131) 
= hdPviy - eh) + vdPY{y + e«), 
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with components equal to the initial transverse beam pro- 
file shifted in position by an appropriate e. Unless the 
shifts become degenerate with = —e^ then these prob- 
ability observables can be used to indirectly measure any 
observable in the framework of h and v. 

Since the observable enh — eyV appears as a generator 
for the rotation U, it could be tempting to assert that 
the detector must specifically measure this observable. 
However, only the framework in which the generating 
observable is defined determines which observables can 
be measured. The choice of CV, which can be made 
in postprocessing, will calibrate the detector to measure 
specific observables in that framework. 

We considered a classical version of similar probabil- 
ity observables in §11 D 41 Generalizing that derivation 
only slightly, we can find the preferred contextual val- 
ues (CVs) fviy) for an arbitrary polarization observable 
Fx ^ fxih)h + fx{v)v, 



lY{y)^Ix{h) 



v+{y)+v-{v) 



(132a) 



+ fx{v) 



v+{y) - v-{y) 



v^{y) = pyiy-j^\+py}y + '^) ^ (i32b) 



v-{y) 



a + b{eh,ev) 
PY{y-£h) -PY{y + ev) 
a-b{eh,ev) 



J PYiy)dy, 



(132c) 
(132d) 



b{eh,ev)= J PY{y~£h)PY{y + ev)dy. (132e) 

In particular, one can measure the orthogonal observ- 
ables h — V and Ix using the expansions. 



h-v = J v^{y)dE{y), 



h + v 



v+{q)dE(v). 



(133) 
(134) 



For the specific case of an initial Gaussian beam cen- 
tered at zero, we have. 



P{y) = exp ( -^"j 



(e,, -Ke^)/2, 
{eh - e.)/2, 
1 

2ct0F' 

aexp(-(e/(T)^), 



b{e) 

,(.)^V2-P(-'^)-r^(^), 
smh(^) 

exp(-(^)cosh(l(^) 



v+{y) = V2 



cosh(^ 



(135a) 

(135b) 
(135c) 

(135d) 
(135e) 

(135f) 
(135g) 



What matters for the measurement is the average trans- 
lation e away from the midpoint {y — 6). The amplifica- 
tion of the CVs is controlled by the parameter e/cr, which 
serves as an indicator for the ambiguity of the measure- 
ment. When the shift e is large compared to the width 
of the Gaussian cr, then e/cr ^ 1; the shifted Gaussians 
for h and v are distinguishable; the CVs approach the 
eigenvalues of the measurement; and, the measurement 
is unambiguous. When the shift is small compared to the 
width of the Gaussian, then e/a <C 1, the Gaussians for 
h and v largely overlap, the CVs diverge, and the mea- 
surement is ambiguous. Fig. [3] shows the CVs (|135f|) for 
the Gaussian initial beam profile, as well as for a Laplace 
and top-hat profile for comparison. 

This sort of detection protocol was used in the original 
paper on weak values [17j in the form of a Stern-Gerlach 
apparatus that measures spin analogously to polarization 
using a continuous momentum displacement generated 
by a magnetic field. The initial Gaussian beam profile 
shifted an amount e away from the midpoint of the ini- 
tial beam profile in a direction corresponding to the value 
of the spin. Since the beam profile was symmetric about 
its mean, the generic CVs fviv) — y/<^ were implicitly 
assigned as a linear calibration of the detector, which tar- 
gets a specific observable analogous to h — v. Motivating 
this implicit choice was the fact that when e is sufficiently 
small, the two overlapping Gaussians produce to a good 
approximation a single resulting Gaussian with a shifted 
mean consistent with such a linear scaling, as shown in 
Fig. m That such a choice was being made was later 
pointed out explicitly in before we identified the role 
of the CVs in [49| and derived the preferred form (jl35f|) . 
The proposed spin measurement protocol was adapted to 
a polarization measurement using a calcite crystal, as we 
have developed in this section, and then verified experi- 
mentally di,!!!. 

To produce the weak value from the polarization mea- 
surement, we postselect on a second measurement to form 
a conditioned average. If the initial polarization state is 
pure with a density p — x — \x){x\ and the final postse- 
Icction is also pure z ~ \z){z\, then we have the form, 



fyfyiy)\{z\Miy)\x)\^dy 
J^\{z\M{y)\x)\^dy ' 



(136) 



If we choose the symmetric Gaussian case (|135p with 5 = 
and take the form of M{y) without additional unitary 
disturbance. 



M(y) = he^p ( -^^-4^ ) /^J<7V2n, 



+ V exp 



4cr2 

{y + e? 

4a2 



(137) 



and relate both pure states to the reference state h via 
unitary rotations as defined in (1791) . x — lAa^i3^-y{h) and 
z = Ua'^p>,j'{h), then the postselected probability density 
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FIG. 4. Pre- and postselected detector probability densities zP^iv) for the calcite position measurement (|131[) . shown for strong 
separation (e — 1), wimpy separation (e — 0.1), and weak separation (e — 0.02) of the polarizations. The preselection is 
X — \x){x\ with associated vector \x) = cos(47r/6)j/!-) + sin(47r/6)|t;). The postselection is z = \z){z\ with associated vector 
~ (!''■) + \'v))/V2. Top Row: Initial Gaussian beam profile. Middle Row: Initial Laplace beam profile. Bottom Row: Initial 
top-hat beam profile. Note that the Gaussian profile tilts to approximate a single shifted Gaussian under weak separation, as 
leveraged in the weak measurement protocol introduced in [T^ . 



zPxiy) takes the form. 



\{z\M{y)\x)\' ^ 



exp(- 



2crV27r 



(138) 



((1 + cos /3 cos /?') cosh 



ye 



+ (cos /? + COS /?') sinh 
-f- sin ji sin fi' cos(7 — 7')) , 

Choosing the CVs (I135f|) to target the observable h — v, 
the conditioned average ()136|) then takes the form, 

cos /? -I- COS /?' 



1 -f COS /3 COS /?' -t- ^(e, cr) ' 



(139a) 



.=,(e, cr) — sin/3sin/3' cos(7 — 7') exp ( — t— o 



2^2 

(139b) 

The interference term S(e, ct) in the denominator is the 
only part of the conditioned average that depends on the 
details of the measurement context through the exponen- 
tial dependence on e/cr, which was also noted in [27ll9l|. 



This conditioned average can exceed the eigenvalue range 
of the observable due to the combination of the amplified 
CVs and the disturbance linking the incompatible frame- 
works in the conditional probabilities. Fig. [5] shows the 
Gaussian measurement of the conditioned average (|139p , 
as well as top-hat and triangular measurements for com- 
parison. 

The conditioned average (I139P has two limiting cases 
that eliminate the explicit context-dependence: (1) In 
the strong-measurement limit, e/cr — )■ 00, the interference 
term vanishes, leaving a conditioned average of projective 
measurements that always stays in the eigenvalue range 
of the observable. (2) In the weak-measurement limit, 
tju — >■ 0, the conditioned average reduces to the weak 
value. 



z\ I X 



[z\{h-v)\x) 



{z\x) 



(140) 



cos /3 -t- cos /?' 



1 -)- cos fi cos -I- sin /3 sin /?' cos(7 — 7') 



The weak value is distinguished by being the only case 
that can be written entirely in terms of the observable. 
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friy) zPAy), e = 1 /y(y) zPAy), ^ = O-i friy) zPAy), ^ = 0-02 




FIG. 5. Pre- and postselected conditioned average densities fviy) zPxiv) for a- calcite position measurement targeting the 
observable Fx = h — v with CVs as in Fig. O shown for strong separation (e — 1), wimpy separation (e = 0.1), and weak 
separation (e — 0.02) of the polarizations. The conditioned averages {Fx) ~ Jy fyiy) zPxiv) dy are the areas under the 
curves and are shown inset. As in Fig. |4j the preselection is a; = where \x) = cos(47r/6)|/i) + sin(47r/6)j'i;). The 

postselection is z — \z){z\, where 1^) = {\h) + |w))/\/2. Top Row: Initial Gaussian beam profile. Middle Row: Initial Laplace 
beam profile. Bottom Row: Initial top-hat beam profile. For sufficiently strong separation all three detector profiles will 
produce the strong conditioned average ^{Fx)^ = —1/2. For weak separation all three profiles approximate the weak value 

z( -^)r = — 2 — \/3 « —3.73. However, the different detector profiles converge to the weak value at different rates with 
decreasing e. 



the post-selection, and the pre-selected state without ref- 
erence to the intermediate measurement. In this sense, it 
is the only context-independent form of the conditioned 
average. However, we shall see in i jlllEI that the weak 
value is not guaranteed as a limit point of the conditioned 
average in the weak-measurement limit. 



3. Example: Three-box paradox 

We can also use contextual values and the general con- 
ditioned average to analyze an often repeated paradox 
related to the logic of weak values: the three-box para- 
dox 92-95]. Suppose one has three boxes, only one of 
which may be occupied by some quantum particle. The 
boxes form a classical sample space, X = {a, 6, c}, with 
Boolean algebra Sx = {0, a,b,c,a + b,b + c,c + a, Ix}, 
with Ix = a + b + c. Suppose that the boxes are prese- 
lected in the pure state with density x = |a;)(x| and asso- 



ciated Hilbert space vector \x) = {\a) + \b) + \ c))/^/3 and 
then later postselected with the pure projector z — \z){z\ 
and associated vector \z) — (|a) + \b) — \c))/VS. The 
postselected state has a transition probability from the 
preselected state of Dx[z) = \ {z\x)\^ = 1/9. 

According to the weak value definition (|119p . the weak 
values of the box-occupation observables for this pre- and 
postselected situation are, 

= 1> (141a) 
.{b): - 1, (141b) 
.(2): = -1- (141c) 

These values have occasionally been interpreted as 
the counterfactual conditional probabilities of box- 
occupation given the double boundary conditions; that 
is, the box-occupation was not checked in between the 
pre- and postselection, but if it had been checked with- 
out disturbing the system, then these probabilities would 
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have been observed. Part of the paradox is that the weak 
value for c is negative, despite the fact that the eigenval- 
ues for the occupation projector c are 1 and and can- 
not produce such a negative conditioned average unless 
negative conditional probabilities average the eigenvalues. 
Moreover, if the weak values do represent counterfactual 
probabilities, then the weak values for a and b both indi- 
cate a counterfactual certainty of occupation, and hence 
require a negative counterfactual probability for c to cor- 
rectly maintain the probability normalization condition. 

Operationally, the weak value is an idealized limit 
point of a pre- and postselected conditioned average. 
Since measuring it is not strictly achievable in the labora- 
tory, we prefer to analyze this situation by considering a 
specific measurement context containing experimentally 
observable quantities. In particular, we shall consider a 
detector for the three-box occupation that has the three 
outcomes 1, 2, and 3. The measurement operations are 
fully characterized by the single measurement operators. 

Ml = a V(l + e)/3 + b^/{l- e)/3 + c (142a) 
M2 = a ^/{l - e)/3 + b + c ^/ (I + e)/3, (142b) 

Mg = a +by/{l + e)/3 + cy/{l- e)/3, (142c) 



corresponding to the probability observables Ei = Mf, 
E2 = M|, and E3 = M|. For the particular pre- 
and postselection under consideration, these measure- 
ment operators produce the generalized ABL conditional 
probabilities. 



(2) 



Tr(zMixMi) 




Eti Tr(zM,xM,) ' 




3-2Vl + e-2Vl-e 


+ 2VI - £2 


9-2Vl + e-2Vl-e 


-2Vl-e2 






Tt{zM2XM2) 




ELi Tr(zM,;xM,) ' 




3-2Vl + e + 2Vl-e 


- 2VI - £2 


9-2Vl + e-2Vl-e 


- 2V1 - £2 



(143a) 



(143b) 



(3) 



1 2e ^, 3, 

^3-y+6+^(^) 

Tr(zM3xM3) 
~ ^tiTr(^M,xM,)' 
3 



(143c) 



2yr 



2vr 



2vr 



2v^ 
2e 
"3 



2Vl~~e - 2\/r~~ 



+ - + -^ + o(e^)' 



These detection probabilities are all positive and well- 
formed, since they are operationally accessible quantities. 

If we target a particular observable Ox = ox{a)a + 
Ox{b)b + ox{c)c for the three boxes, we can solve for the 



appropriate CVs by inverting the matrix equation, 
'^ox(a)^ 



1 



a + e 1 -e 1 

oxib) I = ^ I 1-e 1 1 + e 
,ox(c)/ \ 1 l + e 1 




(144) 



producing. 




Ox (a) + ox{b) + ox(c) 



^ I Ox {a) - ox{b) 
+ - ox(c) - ox{a) 
' \ox{b) ~ ox{c) , 



(145) 



In particular, we can use these CVs to expand the box- 
occupation observables in terms of the probability ob- 
servables, 



1 1 

3 + 7 



El 



— — — 1 Eo H — E'^. 
3 e / 3 ' 



\lx + \{Ei-E2), 



b = 



1 1\ 1 

— — — 1 E^ H Eo 

3 e / 3 



^lx + -(S3-Sl), 

3 e 



1 1 

3+7 



E. 



1 1 
3 e 



1 1 

3^7 



E. 



E. 



(146a) 



(146b) 



(146c) 



= il;f + 1(^2-^3). 

Hence, all three box-occupation observables can be mea- 
sured simultaneously from the same set of probabilities 
for the three detector outcomes. Notably, the CVs as- 
signed to each outcome can be negative for sufficiently 
small e, even though all eigenvalues are positive or zero. 
Hence the values being averaged can be negative and thus 
can lead to negative averages in principle. 

Computing the appropriate conditioned averages we 
find to 0(e3), 



(147a) 
(147b) 
(147c) 



(a) =1- 


e 

2 ~ 




= 1 + 


e 

2 ^ 




.(2). = -i 


+ Y 





which shows that the weak values (|14ip are the e — 
limit of the conditioned averages with this specific mea- 
surement context. 

The paradox of the negative weak value (|14ip can 
therefore be largely resolved in the following sense: the 
combination of the amplified negative CVs and the dis- 
turbance in the detector probabilities linking pre- and 
postselection frameworks leads to the negative result for 
^{c)^ given sufficiently small e. No negative probabilities 
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are required to obtain the negative limit point since neg- 
ative CVs are being averaged in the weak hmit and not 
eigenvalues. All operationally accessible probabilities are 
positive and well-behaved: the negative CVs are assigned 
by the experimenter and highlighted by the disturbance 
in the well-behaved probabilities. 

We leave the reader to ponder how to interpret the 
operationally accessible negative conditioned average 
(|147cp . However, we note that with at least this mea- 
surement context the conditioned averages do obey the 
equality, 

+ A^). + A^. = 1' (148) 

for all values of e. The three sets of CVs sum to unity 
for each detector outcome, leaving only the normalized 
sum of detector probabilities ^(l)^ + A )x ~^ z( )x ~ ^' 
For more discussion of this paradox, see, for example. 




E. Deriving the weak value 

Weak value controversy. — As we have seen for the case 
of the calcite detector (I140p and the three box para- 
dox p4ip . the weak value (IllSp seems to arise natu- 
rally as the weak limit of postselected conditioned av- 
erages. Indeed, much of the existing literature on weak 
values (e.g. [Mill [M^) operates under the as- 
sumption that it is the only weak limit of a conditioned 
average, or that it is a well-defined property of a pre- 
and postselected ensemble prior to the ensemble being 
measured. However, a conditioned average does not nec- 
essarily converge to the weak value in the weak measure- 
ment limit, as has been noted independently by several 
groups ^ m, 113, m m H^, ma^ng its interpretation 
as a well-defined property worthy of more careful con- 
sideration. To obtain correct laboratory predictions for 
a conditioned average, the formula (jllSp must be used, 
which generally requires the specification of the detection 
strategy and the protocol for assigning CVs to target a 
specific observable. 

Despite the interpretational controversy, the weak 
value (|118p is distinguished by being a context indepen- 
dent weak limit of the conditioned average that is easy 
to compute theoretically and appears quite commonly in 
typical laboratory situations. The formal expression of 
the weak value can also appear in other measurement 
scenarios, such as in "modular values" 1971 . or even per- 
turbative corrections to energy spectra [98| , which makes 
it an independently interesting quantity to study. 

We will now demonstrate how the weak value (|118l) can 
be uniquely defined from the general conditioned average 
(jllSp by imposing a set of sufficient conditions that the 
measurement should satisfy. 

Preliminaries. — First we note from (j95cl) that each 
measurement operator has a polar decomposition, 
My.y = Uyy\M\ 

y,y'^ i'^ terms of a unitary operator Uy,yi 



and a positive operator \M\y,yi. It then follows that, 

Ky'^'^^y^y = \Mky'Uly,E',Uyy\M\yy, (149) 
= {\M\ly,,Uyy{Ei)}/2 

- [\M\yy, [\M\yy,UyyiE',)]]/2, 

where {A, = AB + BA is the anticommutator, 
\A, B] = AB — BA is the commutator, and Uyy{E'^) — 

Uy yiE'^Uy,y' is a unitary rotation of the postselection. 

Sufficient conditions. — Next we make the following suf- 
ficient assumptions regarding the dependence of the rele- 
vant quantities on the measurement strength parameter 
e: 

1. The measurement operators Myy are analytic 
functions of e, and thus have well defined Tay- 
lor expansions around e = such that they are 
proportional to the identity in the weak limit, 
Vy,2/', Ym\^^oMy,y, cx Ix- 

2. The unitary parts of the measurement operators 
Uyy — exp{iGy^y'{e)) are generated by Hermi- 

tian operators of order e'^, Gyy{e) = ^^G^ylji + 
0(6*^+^), for some integer fc > 1. Furthermore, 
each Uyy must commute with either the system 
state or the postselection, V?/, y' , \Uy y',px\ = 0, or 
yy,y',[Uyy,E',] = Q. 

3. The equality Fx — /I'l^; y)Ey{^) must be satis- 
fied, where the CVs /y(e; y) are selected according 
to the pseudo-inverse prescription. 

4. The minimum nonzero order in e for all \M\yy(e) is 
e" such that assumption (3) can also be satisfied for 
some CVs by the truncation to order e". That is, 
for ally,?/', \M\y,y, = Cy.ylx + \M\l'^l,e^ + 0{e"+^), 
where Cy ^, — -Py(y) is the detector probability 

in absence of interaction, and some of the 
may vanish. 

5. The probability observables Ey{e) = 
J2y' My y,{e)My y'{e) commute with the observable 
Fx. 

Theorem. — Given the above sufficient conditions, we 
have the following theorem: in the weak limit e 
the context dependence of the conditioned average (jllSp 
vanishes and the weak value (|118p is uniquely defined. 

Proof. — To prove the theorem, we expand (jllSp to the 
minimum necessary order of e" and then take the weak 
limit as e — > 0. First, we expand (11491) to order e" using 
assumptions (1), and (4), 

Ml y,EiMyy ^ cly,Uyy iK) (150) 

+ Cyy{\M\^^l,,Uyy{E'^)]e- + 0{e-+'). 

Generally, the remaining unitary rotation of the post- 
selection will disturb the weak limit. However, if 
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[C/y,y', E'^'j = as in assumption (2), then Uy^y'{E'^) = 
E'^ and the unitary disturbance disappears. If instead 
\Py,y'^ Px\ = 0, then we can apply the state to (|150p 
and find, 

(Ky'^'^^yy)x = 4,y'(^y^y'iE'^))x (151) 

+ Cy,y'{{\M\^-l„Uy,y,{Ei)})^e- 
+ 0(e"+i). 

Since {Uy^y,{E'J)^ = TTx{W{px)Ei) ^ {E',)^, the 
first term simpfifics. The unitary rotation in the second 
term expands to Uy,y'{E'J = E'^ + 0(e''), and the 0(e'') 
correction can be absorbed into the overall 0{e^~^^) cor- 
rection. 

Therefore, after summing over y' we find up to correc- 
tions of order e"+^, 

E {Mly'^>^y^y')x ^ ii^vi')^ ^0)x/2' (152) 
y' 

where the probability observable has the expansion to 
order e", 

Ey{e) ^Y.\^'^\ly''^')^ (153) 
y' 

= Y.^cly,lx +2cy,y,\M\^-l,e- + 0{e-+')). 
y' 

Inserting (|152p into (jllSp . we find, 

{{Fx, Ei}/2)^+j:yfYie;y)Oie"+') 
({lx,i?^}/2)^ + 0(6«+i) 

(154) 

where we have simplified J2y fY{£',y)Ey{e) — Fx in the 
numerator, and J^y^yi^) — Ix in the denominator. 
Hence, unless the C vs in the numerator have poles larger 
than 1/e" the correction terms of order e"+^ will vanish, 
producing (|118p in the weak limit e — > 0, as claimed. The 
last step in obtaining (|118p . therefore, is to show that the 
pseudoinverse solution for fy that was indicated by as- 
sumption (3) cannot have poles larger than 1/e". The 
following lemmas will show this, which will prove the 
main theorem. 

Lemma preliminaries. — First, we note that Fx com- 
mutes with {Ey{e)} by assumption (5). As such, we will 
replace the CV definition Fx = J2y fY{i',y)Ey{€) with 
an equivalent matrix equation, 

fx = Sfy, (155a) 

The pseudoinverse is constructed from the singular value 
decomposition 5 —UJ^V^ as 5+ = VI^^U'^ , where U and 
V are orthogonal matrices such that U^U — VV^ = 1, 



E is the singular value matrix composed of the square 
roots of the eigenvalues of SS^ , and is composed of 
the inverse nonzero elements in E-^. 

Next, we note that the truncation of the matrix S to 
order e" has the form, 

S'^V + e"Sn, (156) 

where V = {Py(jj)^, • • • ) is a matrix whose rows are iden- 
tical and whose columns contain the interaction-free de- 
tector probabilities Py(j/), and 5„ — {e[^\ • • • ) is a ma- 
trix whose rows all sum to zero. Furthermore, since the 
solution to the equation fx — S' fy is assumed to exist 
by assumption (4), then the dimension of the detector, 
N , must be greater than or equal to the dimension of the 
system, M . We then have the following two lemmas. 

Lemma 1. — The singular values of the truncated matrix 
S' have maximum leading order e". 

Proof. — The singular values of S' are crfe = where 
Afe are M eigenvalues oiH = S'^S, with its other N — M 
eigenvalues being zero. Since V'^Sn = 0, this ma- 
trix has the simple form H = V^V + e^"5j5„, where 
{V'^V)ij = MPY{i)PY{3) is M||p|P times the projection 
operator onto the probability vector p = {PY{y), ■ ■ 

and {SlSnh = ^1"^ • Ef\ We wiU use % to determine 
the singular values of S' . 

Differentiating the eigenvalue relation 
-H(e2")ufc(e2") = Afe(e2")M;,(e2n) ^nd the eigenvec- 
tor normalization condition Mfe(e^") • Uk{e^"') = 1 with 
respect to e^" produces the following deformation 
equation that describes how the eigenvalues of T-L 
continuously change with increasing e^", 

A,(e2") = ||5„^r,(e2")||2. (157) 

Integrating this equation produces the following pertur- 
bative expansion of the eigenvalues for small e, 

Afc(e2") - A,(0) +e2"||5„i/,(0)|p + 0(e4"). (158) 

Hence, to prove the lemma it is sufficient to show that 
Afe(O) and 5„Mfe(0) cannot both vanish unless Afe(e^") = 
for all e. 

Since H(0) = V^V is a projection operator, Ai(0) = 
M||p|p is its only nonzero eigenvalue with associated 
eigenvector ui(0) = Hence, cri(e^") « -v/MUpH > 

to leading order. For fc ^ 1, Afe(O) = and Mfc(O) can 
be chosen arbitrarily to span the degenerate {N — 1)- 
dimensional subspace orthogonal to ui(0). Suppose 
5„Ufc(0) = for some k. It follows that 'H(e2")ufc(0) = 
V'^VukiO) + e2"5;f5„Ufc(0) = since Uk{0) is orthog- 
onal to Mi(0) cx p. Therefore, 'Ufc(O) is an eigenvector 
of 7^(6^") with eigenvalue for any e. Since is sym- 
metric, its eigenvectors form an orthogonal set for any 
e, so we must have the identification Ufc(e^") = Ufc(O). 
As a result, the associated eigenvalue vanishes for any e, 
Afc(e^") — Afc(O) — 0, which proves the lemma. 

Lemma 2. — The pseudoinverse solution /y to (I155ap 
cannot have poles larger than 1/e". 
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Proof. — In order to satisfy (jl55a[) . we have the equiv- 
alent condition for each component oiU^fx = SV"^/y, 

(U^fx)k=^kkiV^fY)k- (159) 



Therefore, all singular values T,kk corresponding to 
nonzero components of fx must also be nonzero; 
we shall call these the relevant singular values. Singu- 
lar values which are not relevant will not contribute to 
the solution fy = V'S'^U^ fx ■ We can see this since 
(/y), = (VE+W^/x), = EfcV,fcS+ so any 
zero element of U"^ fx will eliminate the inverse irrele- 
vant singular value E^^, from the solution for {fy)]- 

Since the orthogonal matrices U and V do not contain 
any poles, and since fx is e-independent, then the only 
poles in the solution fy — fx — VT,^U^ fy must 
come from the inverses of the relevant singular values in 
E+. If a singular value J^kk has leading order e™, then its 
inverse T,'^^ = 1/Efcfc has leading order l/e™; therefore, to 
have a pole of order higher than 1/e" then there must be 
at least one relevant singular value with a leading order 
greater than e". However, if that were the case then 
the truncation S' of S to order e" could not satisfy (|159p 
since to that order it would have a relevant singular value 
of zero according to the previous lemma, contradicting 
assumption (4) about needing to satisfy the CV definition 
with the minimum nonzero order in e. Therefore, the 
pseudoinverse solution fy = fx can have no pole with 
order higher than 1 / e" and the lemma is proven. 

Exceptions. — As the main theorem indicates, the weak 
value will arise as the weak limit of a conditioned average 
in many common laboratory situations, which explains 
its seeming stability in the literature. However, if the 
sufficiency conditions of the theorem are not met, then a 
different weak limit may be found. For example, if there 
is e-dependent unitary disturbance in the measurement, 
then the postselection can be effectively rotated to a dif- 
ferent framework for each measurement outcome, which 
creates additional terms in the weak limit. Similarly, if 
the CVs are e-dependent and diverge more rapidly than 
l/e" then additional terms will become relevant in the 
weak limit. (See, for example, Ref. [35|.) This latter 
case can happen either from a pathological choice of CVs 
by the experimenter in the case of redundancy, or from 
a set of probability observables that cannot satisfy the 
constraint equation Fx = Tliy .fy{^Ty)-^y{^) with their 
lowest nonzero order in e. Such probability observables 
that do not satisfy the constraint equation to lowest or- 
der are poorly correlated with the observable in the weak 
limit. We refer the reader to [53| for more discussion on 
the uniqueness issue of weak values. The theorem pre- 
sented here is a slight generalization of the one presented 
therein. 



IV. CONCLUSION 

In this work, we have detailed the contextual-value 
approach to the generalized measurement of observables 
that we originally introduced in the letter [4^ and further 
developed in |5ll - [53| . This approach completes the well- 
established operational theory of state measurements by 
directly relating the state-transformations to traditional 
observables. Each such operation typically corresponds 
to a distinguishable outcome of a correlated detection 
apparatus. An experimenter can construct an observ- 
able from such an apparatus by assigning values to its 
outcomes. The assigned values can be generally ampli- 
fied from the eigenvalues of the constructed observable 
due to ambiguity in the measurement, and thus form a 
generalized spectrum that depends on the specific mea- 
surement context. Hence, we call these values contextual 
values for the constructed observable that allow its indi- 
rect measurement using such a correlated detector. 

Constructing an observable using contextual values re- 
quires only classical probability theory, according to (j32l) . 
Hence, the technique may be used wherever Bayesian 
probability theory applies. We have outlined an alge- 
braic approach to operational measurements from within 
Bayesian probability theory to encourage applications 
along these lines. 

We have also shown how to construct a quantum prob- 
ability space as the orbit of a classical probability space 
under the special unitary group. This point of view il- 
lustrates that quantum observables can be constructed 
from contextual values in precisely the same way (jll2p 
as their classical counterparts. The approach also high- 
lights the similarity between Liider's rule (|89p for updat- 
ing a quantum state and invasive classical conditioning 
(1131) . which leads to a similarity between quantum oper- 
ations (1951) and classically invasive measurement opera- 
tions (j25p . Numerous physical examples have been given. 

By putting all observable measurements on the same 
footing, the contextual values formalism subsumes not 
only projective measurements but also weak measure- 
ments as special cases. To emphasize this point, we have 
analyzed the quantum weak measurement protocol in- 
troduced by Aharonov et al. [13] in detail as an exam- 
ple using a calcite crystal and a polarized laser beam. 
We have also derived the quantum weak value (|118l) as 
a limit point of a general pre- and postselected condi- 
tioned average (|115p as the measurement strength goes 
to zero and have given sufficient conditions for the con- 
vergence to hold. Like the classically invasive conditioned 
average ((37| . the quantum conditioned average, with the 
quantum weak value as a special case, can exceed the 
eigenvalue bounds of the observable. 

The use of contextual values considerably clarifies and 
formalizes the process of measuring observables, particu- 
larly within a laboratory setting. The elements of the for- 
malism directly describe operationally accessible quan- 
tities that can be tomographically calibrated. As such, 
the technique should be of considerable interest to exper- 
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imentalists working on measurement and control of both 
quantum and classical systems. Furthermore, the for- 
malism prompts interesting theoretical questions about 
the foundations of quantum mechanics by highlighting 
its myriad similarities to classical probability theory. 
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